Factor Analysis Discrete Data 2985165

download Factor Analysis Discrete Data 2985165

of 30

Transcript of Factor Analysis Discrete Data 2985165

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    1/30

    Factor Analysis for Categorical Data

    Author(s): D. J. BartholomewReviewed work(s):Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 42, No. 3(1980), pp. 293-321Published by: Wiley-Blackwellfor the Royal Statistical SocietyStable URL: http://www.jstor.org/stable/2985165.

    Accessed: 30/10/2012 08:49

    Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at.http://www.jstor.org/page/info/about/policies/terms.jsp

    .JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms

    of scholarship. For more information about JSTOR, please contact [email protected].

    .

    Wiley-BlackwellandRoyal Statistical Societyare collaborating with JSTOR to digitize, preserve and extend

    access toJournal of the Royal Statistical Society. Series B (Methodological).

    http://www.jstor.org

    http://www.jstor.org/action/showPublisher?publisherCode=blackhttp://www.jstor.org/action/showPublisher?publisherCode=rsshttp://www.jstor.org/stable/2985165?origin=JSTOR-pdfhttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/stable/2985165?origin=JSTOR-pdfhttp://www.jstor.org/action/showPublisher?publisherCode=rsshttp://www.jstor.org/action/showPublisher?publisherCode=black
  • 8/14/2019 Factor Analysis Discrete Data 2985165

    2/30

    J. R. Statist.oc. B (1980),42, No. 3, pp. 293-321Factor Analysis orCategoricalData

    ByD. J.BARTHOLOMEWLonidonichool ofEconiomicsnid oliticalScienice

    [Read before heROYAL STATISTICAL SOCIETY at a meeting rganizedby heRESEARCH SECTION on Wednesday,May 21st, 1980, Professor . WHITTLE in theChair]SUMMARYThe method f factor nalysis s widely sed as an exploratoryool to reduce hedimensionalityfmultivariateata.Thefact hat he tandard odels trictlypplicableonlywhen hemanifestariablesre caled s a seriousimitationn ocial cience herethe ariablesre ftenategorical.n this aperwe im oprovide theoreticalrameworkwithin hichmethods or hefactor nalysis fcategoricalata can be devised ndcompared. iscussions restrictedo the ase of ordered ategories here he atentvariablesrecontinuous.t is argued hat he hoice fmodel hould e made romrestrictedet which ncludes wo existing odels s special ases.A new methodsproposedogetherith simplepproximateechniquef ittingor he ne-factorodel.Thepaper oncludes ith n evaluationf xisting ethodsndmakes ome uggestionsabout hedirectionhich utureesearchhould ake.

    Keywords: ACTOR ANALYSIS; LATENT STRUCTURE ANALYSIS; MULTIVARIATEANALYSIS;CATEGORICAL ATA;MULTI-DIMENSIONALONTINGENCY ABLES; ATA REDUCTION;SCALING;ORDINALDATA1. THE BACKGROUND

    AN important bjectofmuchmultivariatenalysis storeduce hedimensionalityfthedata.This s particularly esirable n theexploratorytagesofan investigationothto provide nintelligibleummarynd to suggest ruitfulinesformodelbuilding.When the variables recontinuous nd measured n a common cale,principal omponent nalysis ften erves hispurpose.Factor analysisachieves muchthe same end by setting p a modelin whichtheobserved ariables re related oa smaller etof atent ariables ndto an "error".Neither fthesemethods s directly pplicableto categoricalvariablesyetthe needfordata reductiontechniquesn uch ircumstancessno esspressing.his sparticularlyruenthe ocialscienceswheremuch f hedataarisingscategorical. he aimof his aper stoprovide frameworkorthedevelopment f methodsforuse when all the variablesare measured on an orderedcategorical cale. In the processwe shall show how earlierapproachesfordichotomousvariables rise as specialcases ofourgeneralformulation.We assume thatwe have a simplerandomsampleofsize N whose members re cross-classified n p categorical rderedvariables.Orderings very ommonbutitcan alwaysbeachieved, t the oss ofsomeinformation,y reducing ach dimension o a dichotomy. hesampledatacan be setout na multi-wayontingencyablewhose ellfrequenciesave a jointmultinomial istribution.here s no requirementhat ny marginal requenciese fixed.Our aim is todetermine hether hep-variate epresentationf theoriginal ontingencytable can be replaced,without ignificantoss of nformation,yone ina smallernumber fdimensions.We shallargue hat he heoretical rameworkor his obe donealready xistsnlatent tructurenalysis, f whichnormalfactor nalysis s a specialcase.Latent tructurenalysishas received ittle ttention rom tatisticians.t appearstohaveoriginatedwithLazarsfeld, sociologist,nd is expounded nLazarsfeldnd Henry 1968).Amore ecent iscussion f ome spects rom statisticaloint fview s containednGoodman

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    3/30

    294 BARTHOLOMEW - FactorAnalysisorCategoricalData [No. 3,(1978) nda usefulntroductionsprovidedyFielding1977). ociologistsaveused atentstructurenalysiss a toolfornvestigatingndmeasuringttributes.he rawdata nsuchcases onsistf he esponsesy ndividualsoquestionsesignedo elicit he ttitudenderinvestigation.he nteresthen ies nwhetherhemultidimensionalesponsesreconsistentwithhe xistencef ne ormore) nderlyingttitudecales. imilar roblemsre ncounteredbypsychologistsoncerned ith bilities ather han ttitudes. key tatisticaleferencesLord ndNovick1968). ome f hisworkhareshe heoreticaloundationf atenttructureanalysis;azarsfeldndHenry1968) oint ut hatGuttmancalingrisess a specialaseoftheirmodel.Althoughhese wofieldsf pplicationshare commonmathematicaltructureheirwaysofproceedingre somewhat ifferent.hen caling bilities ne starts rom hesuppositionhat n abilitysuch sgeneralntelligence)xistsnd scapable f eingmeasuredon a numericalcale.Responseariablesre hereforeelected hich rebelievedoberelatedto the nderlyingimensionf bility.imilaronditions ay lsoapplywithttitudesut nsociologicalnquiriest smore sual oproceednthe everseirection.hat sthe esponsesare given nd the im s todiscover hetherheres evidence f one or moreunderlyingdimensionshichould ccount or he esponseattern.his s the ath suallyollowednprincipalomponentnalysisnd xploratoryactornalysisnd s the newe hall dopt ere.Latent ariablemodels avenot ound ide avour ith tatisticians.artlyhiss becauseof he ediousnd omewhatrbitraryethods sed or ittinghemodels. hisdifficultyasbeenargelyvercomeyusingomputerso mplementfficientstimationrocedures.oreserious, avebeen oubtsbout he pparentubjectivenessndarbitrarinessf hemethodsused ornterpretinghe esults.ossibly,hissbecausehe ubstantivend echnicalspectsfthemethodsre ocloselyntertwinedhat nlynexpertnthe pplied ield andeployhemeffectively.atent tructureodels ave ertainlyriseno meet ealneedsnpsychologyndsociologyndtheres a growingnterestmong conomistssee,for xample, ignerndGoldberger,977). would rgue hat uchmodelsre mplicitnmost ualitativenalysesfsocial henomenand hat t sthe usinessf tatisticiansomake heiraturexplicitnd, sfar s possible,uantitative.ttentiony tatisticianssthereforeverdue ut n thepresentpaper ur im smoremodest.t stotake he elativelyeglectedreaof ategoricalata ndconsiderowbest ocarry utwhatmay easonablyedescribeds factornalysis.This snotthefirstttempto treat hefactornalysisf ategoricalata.Some arlierattemptsave imed,y omemeans rother,obringhe roblem ithinhe rameworkf hestandardormalheory,ommonactor odel.Muthen1978)s the atestn groupf apersincludingock ndLieberman1970) ndChristofferson1975)which ealwith he asewhereall variablesredichotomous.nessence,hey o this y upposinghat he P ontingencytable rises romroupingachdimensionf p-variate ultinormalistributionnto wocategories.heunderlyingariablesre hen ssumedo have he inear tructuref he actormodel. his pproachsveryloselyelated othat fLazarsfeldndHenry1968) nd LordandNovick1968).All of hesemodels rise s special asesofourgeneralpproach.McDonald 1969)proposed method oranalysingmulti-categoryata andhe alsoreviewed uch f he arlier ork. ikeushemade he atenttructureodel is tartingoint.McDonald'smethod oesnotutilize he rderingf he ategories;eitheroesthat fBock(1972), asedon a multivariateogisticmodel.In thispaper he im hasbeento approach heproblemromirstrinciplesndtheemphasissonfundamentalsatherhan omputationalechniques.uchmore emainsobedone speciallynthe omputationalidebuttheresultschievedo far reencouraging.

    2. THE MATHEMATICAL FRAMEWORK2.1. Terminologynd NotationWebegin y ettinghe roblemn ts eneralontextnd henntroduceategoricalata sa specialase.Thevariables hich eobserve ill ecalledmanifestariablesnd redenoted

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    4/30

    1980] BARTHOLOMEW - FactorAnalysisorCategoricalData 295byx = (x 1,x2, ..., xp)T.A latent tructure odel upposes hese ariables o berelated oa setofq unobservableatent ariables enotedby y= (Y1,Y2, .., Yq)T.For themodeltobe practicallyuseful needs obe much mallerhanp.Therelationshipetween andy s stochasticndmaybe expressed ya conditional robabilityunction (x Iy) being hedistributionf x giveny.Thiswill be a density r probabilitymass according s x is continuous r categorical. heproblemsto inferomethingbouty fromheobserved aluesofx. Let p(y)denote he ointdistributionfthey'sandf(x) thatofthex's then he twoare relatedby

    f(x) = { y(xy)(Y) y, (1)whereR is the range pace of the atentvariables.After has beenobserved ur knowledgeabout y is givenby

    p(Y X)= P(Y) (X IY)/f(X). (2)The data reductionwe areseekingsthus chieved romhefact hat hedistributionfy sofsmallerdimension hanthatof x. In practicewe maywellbe contentwithsome suitablesummarymeasure fthe conditional istributionfysuchas E(yIx).In the aseof hemulti-wayontingencyable,x will dentifycellof he able ndf(x)willbe its multinominal robability.We shall label the categories long each dimensionby0, 1, , ..,0 beingthe "lowest" evel,1 thenext nd so on.Thus,for xample, he designation(0,2, 1,3)referso the ellwhere hefirstariable t evel ,the econd t evel ,the hirdt evel1 and thefourtht level3. A distinctiveeaturefourmodel s thatthe atentvariables recontinuous; (y) ndp(yIx) are thusdensities. hischoice s basedon thefact hatmost atentvariableswhicharise in social sciencediscourseare thought f as beingcontinuous.Forexample, uality f ife,tandard f iving, oliticalhueand aggressivenessre all regardednthisway.Thecasewhere he atent ariables re betterreateds categoricalmaybe handled ylatent lass analysis, orwhich ee Goodman (1978).2.2. AssumptionsLittleprogressan bemadewithout omeassumptionsboutthevariousfunctions hichwe have defined. he firstssumptionwe makeis thatthey's are independent,hat s that

    qP(Y)= l P(YA).i= 1There s nocompletelyompelling easonfor his ssumption ut tmakes he nalysis asier ocarry ut and interpret.t thereforeeemsreasonable oadopt tuntil ractical onsiderationsdictate therwise.he second ssumptions abouttheformfp(yi).Weshallarguebelow thatthis distributions essentially rbitrarynd that the choice may be made to suit ourconvenience. or thisreasonwe have made it uniformn (0, 1). The justificationorthisassertionequires s to ookmore losely t thenature f latent ariable. here eem o be twodistinctases as follows:(a) The latentvariablemaybe "real" n the sense that t could, n principle, e measureddirectly.nexamplewouldbesome ensitiveuantityikepersonalwealth. o avoidasking hedirect uestionwemightsk a batteryfquestions bout possessions ndlife-stylenthehopethat heymightnable us to identifynd scaletheunderlyingariable-wealthn this ase.Thedistributionfwealth s certainly ot arbitrarynd it wouldbe quite nappropriateoassumethat t was uniform. uch cases seemquiterare.More commonlywehavethesecondcase.

    (b) The latent ariable snot"real" meaning hat t could not bemeasured irectly,ven nprinciple.t is a mental onstructsedtofacilitateconomy fthought. ttitudesndabilitieslargely omeinto thiscategory.

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    5/30

    296 BARTHOLOMEW Factor Analysis for Categorical Data [No. 3,Since heresno"natural"cale nsuch asesweare t ibertyo constructneto suit urconvenience.inceorderings the highestevelof measurementvailable n themanifestvariablestseems easonableoaskfor o more han n ordinalevel fmeasurementf helatent ariableslso.Such scale s arbitraryo the xtenthatnymonotonicransformationof the hosen calewould erve quallywell.Thus whateverhedistributionf the atent

    variablen a chosen cale t analways egiven desired istribution,uch s the niform,yanappropriate onotonichange f cale.Theremainingunctionobe specifieds (y x).Wemake wo ssumptionsbout his. hecrucialssumption,hichs fundamentalo the ationalef hemethod,s that f onditionalindependence.e assume hatpn(x IY)= nli(xi IY) (3)

    This meansthatthe observed ependencemongthex's is wholly xplained y theirdependencen they's. liminatingariationnthe atter emoveshenter-dependencef hex's. n that ense he ssociationmong hex's s fullyxplained ytheir ependencen thelatent ariables.hisgives ormalxpressiono thehypothesishat he bserved ariablesredescribablen ermsf smallerumberf atentimensions.f(3)were ot ruetwould mplythat here as someother ariable xertingcommonnfluencen thex's.Under he ssumptions ade o far1)becomesI' Ij p pf(x) = ..@J l i(xi y)dy= E [l zi(xiy). (4)O O i=1 i=lThe hoice f he ormf he esponseunctioni(xiy), ometimesalled he race unction,sthefinaltep n the pecificationfthemodel.

    3. THECHOICEFRESPONSEUNCTIONFor the pplicationo contingencyableswe shall uppose,nitially,hat achvariablesdichotomous.hisrequirementillberelaxednSection . In this asewemaywritei(xi y)= {4(y)}Xi{ 1 _7i(y)I xi, (xi = 0,1); (5);i(y) sthus he onditionalrobabilityf responsetthe upper"evel nthe thmanifestvariablealso spoken f s a positive esponse).We beginby listing omepropertieshich hefunctioni(y)should ossess ndthenconsider hetherunctionsxistwhichmeet he pecification.et# denote hefamilyfacceptableunctionshenwe claim hat should ossess hefollowingroperties:(i) 0

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    6/30

    1980] BARTHOLOMEW - Factor Analysis or CategoricalData 297most atent ariables onsiderednpractice resuchthat heprobabilityfresponse ncreases,or decreases,nstepwith hanges n thevariable. onditioniii)reflectshe rbitrarinessf hedirectionn which he atent ariable s measured. orexample,we canequallywellmeasure hepolitical pectrumromeft orightrviceversa.Conditioniv) s mportantnd arises rom hearbitrarinessnthedirectionftheorderingf he ategories. nsweringyes" o a question sthe same as answeringno" to its negation.This condition nsures hattheoutcomeof theanalysisdoes not depend on which hoice we make.Conditions v) and (vi)ensure hattwospecial cases are included; v) is thecase ofcomplete ndependencewhenno reductionndimensionalitys possible; vi)which s less mportant,s the case of a perfectcale Guttmanscale).All thosewithyabove Yorespondpositivelynd all those belownegatively.There s no difficultybout findingunctions hich atisfyiHvi) but mostdo not meet(viiHix) and there s a natural onflict etweenvii)and (viii). t is not,ofcourse,possibletoexpress i(y) s a linear unctionf tsparameters ithout iolatingi). nsteadwe shall onsidertheclassof functionsivenby

    qG{ii(Y)} = XiO E cijH(yj), (i = 1,2, ..,P). (6)j=1The coefficientscxij} an thusbe interpretedn theusual way as factor oadings.We must electthe functionsG and H to meetthe conditions et out above. If we choose H so thatH(yj)= - H(1 - yj) onditioniii) s satisfied,he ffecteing ochange he ign f he oefficientwhen hedirection fmeasurementfyj s changed. onditions i) and iv) mply hatG- 1musthavetheform f hedistributionunctionf random ariable istributedymmetricallyboutzero.This s equivalent o requiringhatG(v)=-G(1 - v) o that othGandH must e selectedfrom he same class of functions.In practice the choice is very imited, he commonlyused functions eing the logit(logitv = log v/(1 v)) and the probit probit v) = - '(v) where D is the standardnormaldistributionunction). notherpossibilitys theinverseCauchy distributionunction;hesimplesthoice sH(v) = G(v)= v i but this iolates onditioni).Thecomplementaryog ogfunctionsruled utbyconditionsiv) and iii).LordandNovick 1968),whodiscussed he aseq = 1,used the ogitforG and theprobit orH. Bockand Lieberman 1970) used theprobitnboth ases.We shall rgue hat hereregoodreasons or referringhe ogit unctionor othGand H. Our choicemay thereforee expressed slogit i(y) = log {Ei(y)/(1 i(Y))}

    q= log Ril(l - i) + E (Xijog yj/(l-yj) (7a)j= 1or q q q=~H -(iYYI (7b)ri(Y) 7ri1Y, 7ril Y1 jyg+ - 7i)rl 1G J?(bj=1 j=1 j=1

    (O < i< 1; - 00 < (ij

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    7/30

    298 BARTHOLOMEW - Factor Analysisor CategoricalData [No. 3,having i niformithi(y) s n 7b) nduj ogistic ith i(u)s n 7c).Any ttempto nterpretparticularormsf itherunctionnphysicalermss thereforencapablef eceivingmpiricalsupport.tmay asily e verifiedhat he unctionsiveny 6b) atisfyonditionsiHv)butvi)onlyholdsforyo= 4 in the imit s c -+ oo.Theparameterihas directndusefulnterpretation.t sthe alue f i(y)when j -forallj andhence sthe robabilityf positiveesponseor n ndividualt themedian ositiononeach atent imension.n thatense,herefore,t s the ypicalrobabilityf esponsentheithdimension.

    4. EXPECTEDFREQUENCIES,-SCORES ND THE FIT OF THE MODELIn ordero fithemodelwehave o evaluatehe xpectationsiven y 4).For the P ablethisnvolveshedeterminationf ntegralsf heformJ gj (y)... j(y)...dy, (8)o owherehe ntegrandontainsfactors.xplicitxpressionsanbefound hen = 1for = ,1and2.However,t s a straightforwardatteroevaluate8)numericallynd hissthemethodwehaveused n the xamples hich ollow.hecomputationalime ncreasesapidly ith .As nfactornalysis e maywish o go furtherndderivehe nalogue f factorcores".Thismaybe thoughtfas findingmapping fthe ellsofthe ontingencyable ntoq-dimensionaluclideanpace.Weapproachhis roblemia the onditionalistributionfygiven . For the2Ptable hishas theform

    pAy x) = Hlmi(y)}Xi -gi(y)} 1 xi/f(x). (9)i= 1This ells showy sdistributediven .Nosinglealue fy sassociated ith given butwemay easonablyake omemeasuref ocationf he istributionsa typicalalue fyfor hatx. Sincey s uniformlyistributed,he lementrnymay e nterpretedsthe uantilef hedistributionnthe thatentimensionnwhichhe ndividualtands.he xpectation(y, )isthushe xpectedroportionf he opulationbelow"n ndividualithmanifestector .We hall efinehis obe they-scoref he ndividualndimension. trequireshe valuationof ntegralsftheform

    1 *| 1 yr i(y)j(y) dy (10)o owhichmaybe obtained umerically.Having ittedhemodel t susefulohave omemeasurefhow uccessfulehave een.nprincipalomponentnalysis edo this ycomputingheproportionf he otal ariationwhichs accountedor y achof he omponents.na multi-wayontingencyable similarmeasuremay ebasedonmeasuresfgoodnessffit. his ouldbechi-squaredutwehavepreferredouseA = 2Z0ilnOi/Ei (1 1)ibecauset sa linearransformf he og-likelihood;i andEi arethe bservedndexpectedfrequenciesnd the ummations taken ver ll cellsofthetable.As a base-lineor hecomparisonetake hevalue fA when he 's are alculatednthe ssumptionf ompleteindependence.his sa measuref he otal epartureromndependencehich ehope he

    modelwill xplain;enotetbyA0.LetAqbe the ame uantityhenhe 'sare hose btainedbyfittingmodelwith latent ariableshenhe atioA0 Aq)/AOs a measurefhowmuchthe riginalepartureromndependencesaccountedor y hemodel itted.f he arameters

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    8/30

    1980] BARTHOLOMEW - FactorAnalysis orCategoricalData 299arefitted y an efficientethodA willhave, pproximately,%2-distributionith egrees ffreedom2P number fparameters1) andthegoodnessoffitmaybe udgedby thismeans.

    5. SOME BASIC RESULTSSincewewish oreduce hedimensionalityf hedata asmuch s possible, naturalwaytoproceed stotakevaluesofq inorder, eginning ith = 1and stop as soon as a goodenoughfitsobtained.However,ncreasing increases henumber fparameterso befittedndtherecomes a pointwhere hemodel s under-identified.s the iteraturenfactornalysis estifiesthequestionof dentifiabilityas subtlefeaturesnd thesameis trueofourmodel.The case q = 1 presents he eastnumber fproblems.t is useful,herefore,o have thefollowingheoremwhichhelps oshowwhether one-factor odelhas anyprospect ffittingthedata. Let us denotebyRii thecross-productatioformed rom heexpected requencieswhenthe table s collapsed overall dimensionsxcept and . That is

    R _Eni(y)j(y) E(1 -i(y)) ( -j(y))12)'i Eni(y) 1-gj(y)) Enj(y) ( 1-gi(y)) (12(Rijwillplaya key role lateron).Theorem . If7ci(y)nd 7cj(y)re bothmonotonicnon-increasingr non-decreasinghenRi -1 > 0,otherwise i -1 < 0 with quality nly f t leastone ofni(y) ndicj(y)reconstant.

    Proof Ri - 1 = {Eni(y)nj(y) Eni(y)Enj(y))}/Eni(y) 1 - j(y))Eicj(y)1 i(y)) hence thesignofRi -1 is thesame as thatofdi,= Eni(y) cj(y) Eni(y) ij(y). NowIdi = ;i(y) {fj(y) - Eij(y)} dy.Supposefirsthatc1j(y)smonotonic ecreasing,henwe canfind = y* uchthat cj(y) Eicj(y)foryk y* and icj(y) Eij(y) fory y* so thatdijmay be written

    dij= ;i(y) {j(y) - Eij(y)} dy+{ i(y) fj(y) Eij(y)} dy.If7;i(y)s also monotonicnon-decreasing

    dij > i(y*)J icj(y) Enj(y)}dy 7ci(y*)J j(y) Ej(y)} dy= 0.If both functions re monotonicnon-increasing similarargument eads to the sameconclusion; therwisehe nequality s reversed. qualityobviously ccursonlywhenone orbothfunctionsreconstant.Thepractical elevance f his heorems as follows.fwereverse he rder f he ategoriesondimension theprobability f positive esponsewillbe 1- ici(y)nstead f ri(y).f i(y)wasformerlyecreasing hecorresponding robability or thatdimensionwiththecategoriesreversed illbe increasing.n theone-factorase,therefore,tmustbepossible ore-order hedimensions o that heresponse unctionsitherllincrease ralldecrease. f his s done allthe(Rii 1)'swillbepositive. or a one-factor odel obeappropriatet sthusnecessarybutnotsufficient)hat n ordering fthemanifestimensions xists uch thatall thecross-productratios regreaterhanone.Suppose,for xample, hat tablewith = 4 has R12 1,R1 ,R14,R23,R24< 1,R34> 1.Reversinghe ategories n a dimensionhanges he ign fRi - 1. nthiscase reversingheorderon dimensions and 2 willproduce set ofpositivevalues.

    Inpractice,f ourse, heRij'shaveto beestimatedrom he ample ross-productatios ndso the igns annotbedetermined ith ertainty.evertheless,f ll ormost f heRiJ 1)'s anbe madepositive t is worth rying one-factormodel.

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    9/30

    300 BARTHOLOMEW - FactorAnalysisorCategoricalData [No. 3,Thefollowingheorem oldsfor llmembers fthe hosenfamilyfresponse unctions.tprovideshebasis for he pproach oestimationroposed n thenext ection nd t lso servesto display featurewhich inksvariousmethods ffactor nalysis ogether.Theorem .

    qEni(y) j(y) Eni(y) ij(y) = Tr2G 1'(oiO)G 1'(axO) aik ajkk= 1+ terms fthe4thdegree noad nd eJ7,where2 = EH2(yj) i, = l,2, .., ; ioj).Theproofsbasedona straightforwardaylor xpansion f heresponse unctionnd termby termntegration.he left-handide s thepredictedovariancebetween i and xj and thetheoremhowsthatthishas a simpleformfthedeparture rom ompletendependence,smeasured ythex's, ssmall.Thecovariances n thenormal heory actormodelhave theform4j= 1 aik ajk and this uggestsweshould ook for amplefunctions hich reestimates fthequantities

    E iry) (y) - Eni(y) irj(y) (13)If uch an befound heywouldhave the amestructureformallx's) s in thenormal ase andhenceknownmethods f stimationould beused. fwetake he ogit ormorGandH itturnsout that

    = Ea, ajkJk+termsf 4thdegree, (14)k=1whereU2 = E log2y/( -y)} = 3-289,868.Thesample ross-productatio an be usedtoestimate ijandhence hex's via 14).Sincecross-product atiosare the"natural"measures fassociation n 2Pcontingencyables t issatisfyingofind hem risingnthis ontext hus upportinghechoice of the ogitfunction.If theprobitfunctions re chosenthen 13) is a firstpproximation o thetetrachoriccorrelation oefficientshichmaybe takenas partlyustifyingheheuristicmethodwhichcarries ut a normalfactor nalysison these oefficients.6. FITTING THE ONE-FACTOR MODEL LOGIT TO THE 2P TABLE6.1. MethodsSince the ogit ndprobit unctionsrevery imilar newouldexpect othversions fthegeneralmodelwehaveconsidered ogive imilar esults nd to nvolve boutthe ameamountof alculation. he ogit unctionseasier ocompute han heprobit ut his dvantages ikelyto befairlymarginal. ockandLieberman1970)developed maximumikelihoodmethod ortheprobitmodel nd llustratedton twoexampleswith = 5 andq = 1. Themethodnvolvedextensiveumericalntegrationndtheyuggestedhat twouldnotbe feasible or inexcess f10 or 12.Christofferson1975) found fastermethodusing leastsquaresfit fEni(y) ndE;i(y) ij(y) (i,j = 1,2,..., ) to their sample estimates.Muthen (1978) has made furtherimprovementsn thismethod ysubstantiallyeducing heamountofnumericalntegrationrequired.t appearsfrom hisworkthat ittlenformations lostbyusingonlythefirst ndsecondordermargins or stimation.rograms ould beprovided or he ogitmodelusing hesamemethods nd theywouldpresumablynvolve imilar mountsofcomputation.However,he ogitmodelhasanimportantroperty hich ftenmakes tpossible oobtain

    a simple pproximateolutionwhen = 1.Thissolution lso offersgoodstartingoint or niterativerocedure ywhichtcan be improved. hebasisofthemethod ests n thefact hatthe pproximation iven y 14) sremarkablyoodevenwhen hex's re far eyond herange

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    10/30

    1980] BARTHOLOMEW - Factor Analysis orCategoricalData 301when they an be described s "small".Table 1 givesvaluesof(Rii 1)/ociajc2 forvariouscombinations f ;ri, cj) nd (oi, j). The approximations to be udgedbythecloseness ftheratiosto 1 (thesecondsubscript n a has beendropped).

    TABLE 1Values fcij = (Rij 1)/ociej2. 7heentriesre unchangedf ;i,7cj) s replaced y 1-i, 1 icj)and oi,xj) by aj, ci)

    (7ri, 7ri)(ire i)irI)1)10) (20, 20)

    (2,2) 0 942 1192 0 984 1 119 1280(2,1) 0-801 0 944 0850 1008 1-196(2, i) 0 614 0 668 0-644 0 731 0820(1, 1) 0 912 1063 0 988 1245 1576(1, i) 0-846 0 934 0 912 1125 1372(j, i) 0-935 1011 1-015 1263 1535(j, 4) 0 917 0 965 0-977 1139 1288(41 i) 0 971 1.001 1-016 1.119 1-188(-h,) 0-994 1000 1-004 1018 1003

    The worst ases occurwhen ciand ocare far partand when i and ij are small.Onlypositivevaluesofci and cajhavebeen considered orreasonswhichwill emerge elow.Ri -1 isnotthe onlyfunctionftheexpectations hichhas ai aj 2 as thefirsterm f tsexpansion. he same s true, or xample, f nRi Unfortunatelyheapproximationsmuchless good for his, nd otherfunctions,hichwe have investigated.The basisof ourmethod festimations to find stimates and irsuchthat a) thecross-product atiosfor hemodelare as close as possibleto those observed nd (b) themarginalproportionsf hemodel ndthedataagree xactly. he x's refoundterativelysing he esultof 14) as a starting oint.We proceedbythefollowingteps.(1) Find a vectora such thatoioj is as close as possible to the estimatedvalues ofcij (Rij- 1)/a2for ,j = 1,2, ..,p; i:j.(2) Find rbyequating ni(y) i = 1,2, ..,p)to the orresponding arginal roportion singthe vector obtained n step1.(3) Improve he estimate fa bya method o be described.(4) Re-estimaterusingthe mproved .(5) Repeatthecycleuntil r and a (orA) converge.Ifmany ycles f he terationrerequiredhe mount fnumericalntegrationequiredwillpreventhemethod eingusedon a routine asiswith resent-dayomputersf is, ay,greaterthan 10. However,we shall show that thefirst pproximations often uiteadequate forpractical urposes. hiscanbeobtained apidly n a computer ith o practicalimit n p.Thecalculation ftheexpected requenciesrom heestimated arametersoes requirenumericalintegrationwith ll methods) nd thismay takea considerable imefor argep.The method equires > 3; f > 3 it snotpossible o reproducehe ij's xactlyo wemustfind n a suchthatthe distancebetween hecij'sand theai caj's s as small, n somesense, spossible. reciselyhis roblemrises n normal heory actornalysisnwhich ontexthe ij'sarecovariances nd the x'sare factor oadings.One solution, lso applicablehere,s basedonminimizing

    p p(A(^jaa)2.i= 1 j= 1

    joI

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    11/30

    302 BARTHOLOMEW - Factor Analysis orCategoricalData [No. 3,It s an iterativerocedurenown s the minres" ethodndwill e found,or xample,nHarman 1970).An lternativeethod hichsboth ntuitivelyppealingnd asy o pplysasfollows. eshallcall ittherow nd columnmethod.Considerhematrix ithlementseiGej} i,-1, 2,..., ).Thismatrixasthe ropertyhat(i, )thelement (Row total) Column total)/Grandotal. (15)Ifwe regard hecij'sas estimatesf theoff-diagonallements e can treat he stimationproblemsoneof indingiagonallementsorhismatrixuch hat15)holds.Wecan,nfact,ensure hat 15)holdsfor very iagonal lement.hex's we seekwill hereforeatisfyheequations

    a= C+X;4 (i = ,2, .. p), (16)j=lwherep pCi= E Aij, C= ECi.j=1 i=1j#i

    These quations re equivalentooi( oj- oi= Ci (i = 1,2,.. p). (17)

    They ave onsiderableppealn heirwn ights a means f stimationinceheyesultromequatinghe ff-diagonalow otalsotheirbservedalues. heirolutionssimplestf he 'sare all of he ame ign.On this uestion ehave hefollowingemma:Lemma.hetwo -vectors hichatisfy17)have lementsf he ame ignf ndonlyfCik0 for ll i.Before roceedingith hemethod f solutiontis thereforeecessaryo ensure hatCi 0 (i = 1, , .., ) bychanginghe rder f ategories,f/necessary.Writing = If I ci, 17) becomes

    oi(A-oi) = Ci or a3- Aai+Ci = 0 (18)which ieldsji= ZA MYA2-40j)4 (i = 1, , .. p). (19)

    Ifwe sumboth ides f his xpressionver weshall btain nequation orA. Oncethis ssolved19)willprovidestimatesfci i = 1, ,..., ).Theresanambiguityf ign nvolvedn 19)whicheads otwo lternativequations or .A is therealroot,wheretexists, fpp-2= (1-4C./A2) . (20)i= 1Otherwise is therealroot f

    p-ip-2 (I(-4CJA) (-4Cp/2+ (21)i= 1It s notmmediatelybvious nwhat ense his rocedure inimizeshe istance etweenhec. sandtheproductsiaj.Thisbecomespparent henwe observehat he ame stimating

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    12/30

    1980] BARTHOLOMEW - Factor Analysis or CategoricalData 303equationsresult rommaximizing

    p p ( p=EE~cin xi j ocoej (22)=1 j=11=1i#j i#jwith especto he 's.Thegreatestossible alue f is I logCcieC) so bymaking22) snear o this alue s we can we are achieving hebest it n a certain ense. or the olution f(17)to be a maximum t s necessary orCi>0 (i = 1,2, ..,p). This also ensures hat he ax'swill llhave the same sign and thusthatthe argument f the ogarithmn (22) is positive.Having obtained we next stimate he 1i'sfrom

    N _______________y, (i = 1, ,...,p), (23)whereNi is thenumber f positive esponses n dimension . These equationsmaybe solvediterativelyy the usual Newton-Raphsonmethodusing ri= NJ/N s a starting alue.The method o far ests n the upposition hatcii= ci j. This s only n approximationowenextwrite ij ai j ijhereOijdependsweakly n i, 7cj,i andcj butwillusually e closeto1.Usingtheestimates fa and ir alreadyobtainedwe next stimateOi%y

    tij = Ci (1ii,Jrj,cZi,i)/1ji c2j (i, = 1,2, ..,p; i#j), (24)cij(i, i,rj 2,Lj) beingthevalue ofcij whentheparametersre giventhesame values as thepreliminarystimates. hecycle f stimations nowrepeated y replacinghe tarting aluescijbycij/i. Itcan happen, s oneoftheexamplesbelowshows, hat he stimatesf do notappeartoconverget all. Thepossibilityf his sapparent rom hefact hat here snothingnthemodel oprevent neor more 's being nfinite.n such case an terativerocedure tartingfrom inite aluesmaynever erminate.hisfeaturesnot s serious s itmighteem.When n xis arge, reaterhan2say,bigchanges na produce nly mall hangesn the hapeof i(y)andhence n theoverall it fthemodel.From thepointofviewof nterpretationll thatmatterssthat hex nquestion s"large". npractice, herefore,odifficultyrises fwestopthe terationas soon as no worth-whilemprovementn the fit s obtained.In most ases we have nvestigated,here he 's turn ut tobesmall ndof he ame orderofmagnitude, onvergencef theparameterstimatess rapid.This is especially rueofRt.

    6.2. ExamplesTo illustrateheuseof heone-factorogitmodel nd to compare t with heprobitmethodwe shallgive he esults ffittinghemodel oseven ets fdata.Two of hesewere sedbyBockandLieberman1970),Christoffersson1975) andMuthen 1978).Theyrelate o 1000cases oneach ofSectionsVI and VII oftheLaw School Admission est LSAT). Background etails ndtheoriginal ata are in Bock and Lieberman 1970). The results ffittinghe ogit nd probitmodels regivennTable 2. For the ogitwegive hefirstpproximation btained rom14)andthe final estimates fter teration. or the probit model we give Bock and Lieberman'smaximum ikelihood estimates,Muthen'sgeneralized east squares (GLs) estimates ndMuthen'sunweightedeast squares estimators. he latter re obtainedby doing a standardfactor nalysison thetetrachoricorrelations btainedfrom he table.We have re-parameterizedhe probitmodel to conformwith 6) as explained ater.For

    Section VI thefitby all methods s excellentwithA almost equal to its expectation. hedifferencesetween hevariousparameterstimates renegligible nd wouldhave noeffectnthe nterpretationfthefactor. n thesegrounds here s thereforeothing o choose between

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    13/30

    304 BARTHOLOMEW - FactorAnalysisorCategoricalData [No. 3,the ogit nd probitmodels.nthe aseofSection II the it s essgoodand theresgreatervariationn the stimatesut, gain, hese renot ufficiento affecthe nterpretationf heanalysis.

    TABLE 2Comparison fparameterstimatesndgoodness ffitforhe SAT datausing he robitnd ogitmodelsLogit Probit

    First Final Maximum Muthen Muthenapproximation estimate likelihood (GLS) (ULS)Section Ial 00460 0-410 0-418 0-417 0-4020(2 00431 0-424 0 433 0 455 0-4480(3 00516 0-538 0537 0-510 0.550a4 00401 0-391 0404 0457 0-402a5 00373 0-351 0359 0-380 0 3457rl 0-941 0-938 0-924 0.925 0-9247r2 00731 0 730 0 709 0 707 0 709ir3 0-562 0-563 0-552 0.555 0-552ir4 0-785 0-784 0-763 0-762 0-763ir5 0887 0-885 0-870 0-870 0-870A 21-24 21-17 21-28Section IIa, 0-663 0 604 0-560 0-588 0.609a2 00574 0-581 0-648 0-667 0-598a3 00898 0907 0-986 0.959 0.92204 00455 0-465 0.462 0-480 0-4800(5 00444 0-420 0-411 0-413 04307rl 0-876 0-870 0-828 0.828 0-8287r2 00687 0-688 0.658 0-657 0-658ir3 0-848 0-849 0-772 0 775 0-772i4 0-620 0-621 0.606 0-606 0-606ir5 0-868 0-865 0.843 0-843 0-843A 33-1 32-21 31-59

    From he omputationaloint f iewhe logit irstpproximation"sthe implestnd aneasilybe carried ut with pocket alculator orproblemsf this ize. n that ase theintegrationnthe olution f 23)can beavoided yusing normal pproximationiven y

    fii Oct1ai2)+ - 1(Nj1N)}. (25)The computingffortequired or hefinal stimatesepends n how many ycles ftheiteration re necessarynd this, n turn, epends n theaccuracy equired. o exactcomparisonsaveyet eenmadewithhe ariousrobitmethodsut t eemsikelyobefasterthan hemaximumikelihood ethod ut lowerhanMuthen'sLS method.Thecomputationf he xpected requenciesndthey-scoresnvolvesalculationsf hesameorder or othmodels hough ere he ogit asthe lightdvantagefbeing asier ocalculatehan heprobit.Insocial, s opposed o psychometricnd educationalpplications,isoftenuite mallandwhatswantedsa simplemethodf xtractingneor wo actorsnd way f rovidingscale fmeasurementor heatent ariables. ethereforeive ive urtherxamplesf his ind

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    14/30

    1980] BARTHOLOMEW - Factor Analysis or CategoricalData 305in whichour main aim will be to see how good thefirst pproximations and to illustrateproblemswhichmay arise n the fittingrocess.The sets of data used are as follows.Set istaken rom ombard ndDoering 1947)and trelates oknowledge boutcancer.Asampleof1729 ndividualswereclassified n fourdimensionsoncerningourcesof generalknowledge ach havingtwo categories s follows:1) Radio/noradio; (2) Newspapers/nonewspapers;3) Solid reading/noolidreading; 4) Lectures/noectures. fifthariablewaswhetherr nottherespondentad a goodknowledge f ancer.Herewe shall ookonly t thefirstour ariables o seewhether heres evidence fa single atent ariablewhich,we mightanticipate,would have to do withhowwell nformedeople are ingeneral.Sets I and II are from olomon 1961)and concern ttitudes oscience xpressed y 2982youngpeople.They weredivided ntotwo equal groupson the basis of their Q (High= II,Low = III). Attitudeswere elicited n the form f positiveor negativeresponses o fourquestions.The data andthe questions re reproducedn Plackett 1974) which lso containsSet I.Set V is a 25tablefrom pton 1978)where he uestions rom surveynentry o theEECwere uchas might e expected o relate o a latentpolitical eft/rightariable.

    SetV is takenfrom study fmobilityfthe lderlynd it s includedhere s an examplewhichgivesrise to problemsnfitting.TABLE 3Parameterstimatesndgoodness ffit f he ogitmodelforhefiveasesdescribedn the ext

    I II iIIt IV vFirst Final First Final First Final First Final First Final

    al 00444 0 445 0 169 0 195 0 168 0 164 0 962 0 986 2 757 1 695a2 11228 1 550 0 448 0 400 0 097 0 161 0 351 0 397 2 682 2 382a2 00864 0860 0818 1068 1 143 15225 0-546 0571 0275 0457a4 0-506 0-456 0-217 0-223 0 242 0-168 0-998 1 074 0-386 0-594S -- - - 0 493 0-5207rl 0 213 0-212 0-818 0 819 0-839 0 839 0 704 0 707 0 008 0 0377r2 0-604 0-620 0-174 0 178 0-169 0-167 0-454 0 453 0-526 0 5247r3 0 461 0 464 0 646 0 664 0 526 0 732 0 469 0 467 0 237 0-2217r4 0 057 0 061 0 543 0 543 0-446 0-448 0 703 0 709 0-411 0-402

    -- - - 0-389 0 387 -A 2371 1915 11-80 11-10 17-03 1292 8983 9040 3331 2162Degrees 7 7 7 21 7offreedomt After 5 cycles.In all cases, xcept I, thevalue ofA suggests hat furtheractormight e involved utthereductionn A from hecase of completendependences alwaysvery ubstantial. he firstapproximationsgoodon thewhole hough here reseveralmarked iscrepancieso whichweturn n a moment.In case III the estimates or xwerenot convergingnd the terationwas stopped fter 5

    cycles.At thispoint hevalue ofA hadvirtuallyonverged.n otherwords, hefit t this tagewas hardlyffectedy hangingheparameters.heextreme ase, = oo, s equivalent o whatiscalled a Heywoodcase in factor nalysiswhere heres no uncertaintyn theresponse nce

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    15/30

    306 BARTHOLOMEW - FactorAnalysisorCategoricalData [No. 3,the alue f he atent ariablesfixed.t sthis ind f esponseunctionhichs assumednGuttmancalingnd hereeems obeno reason or egardingt sanomalousnourmodel.nthis ase ourfirstpproximations lessgoodbut t still hows he amebasicpattern.Case IV is interestingn that lthoughhefit f theone-factorodel s poorthefirstapproximationctuallyrovides slightlyetterit han hefullteration.

    CaseV arises rom24 ablewherene ross-productatiowasveryarge27 3)whereasheothers eren the ange -4.Here t took26 terationsoconvergend there asa markedoscillationromne ycleothe extn he arly tages.venhere,he irstpproximationivesthe road utline f he olutionndthe stimatesfx areparticularlyood.Wehave therexamples ith ery arge ross-productatios,ncludingne for = 7, where he terationappearsodiverge ithmanylementsf tendingo ero. his ppearsobe becauseome fthe r1y)'save heGuttmanorm ith osee onditionvi) f ection )near or 1.Neitherhelogit or heprobitmodel an adequatelyope with hat ituation.A full nalysis equireshe alculationf xpectedrequenciesndy-scores.hese regivenfor ase I inTable4.TABLE 4Fit oftheone-factor odel o Lombard ndDoering's ata oncancerknowledge

    Observed Independence FittedCell frequency frequencyt frequency y-score0000 477 279-1 466 5 0-2120001 12 23-6 16-1 0 3040010 150 251-8 156-4 0-3840011 11 21-3 8l6 0 4750100 231 3592 250-1 0 5220101 13 304 18-9 0-6150110 378 324-1 355.5 0 7000111 45 27-4 44 0 0 7971000 63 87-3 67-1 0 3041001 7 7-4 3 0 0-3941010 32 78-8 35 8 0 4751011 4 6-7 2-4 0 5671100 94 112-4 78-8 0-6151101 12 9.5 7.5 0-7111110 169 101-4 182-9 0-7961111 31 8-6 34-4 0-889Total 1729 1729-0 17290-A 380 57 19415t Theseare the xpected requenciesn the ssumption f ompletendependencebetween he manifest ariables.

    Althoughhe one-factor odel s barely dequate t is a greatmprovementver he"independence"it. hey-scoresrovideusefulankingf he ells ccordingo the egree fknowledgehich he ellmembersxhibit.7. MANIFEST VARIABLES WITH MORE THAN Two CATEGORIES

    7.1. SpecificationftheResponseFunctionWhen hereremore han wo ategoriesnany imensioneneed responseunctionospecifyheprobabilityffallingnto ach category. e do this y defininghe umulative

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    16/30

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    17/30

    308 BARTHOLOMEW - FactorAnalysisorCategoricalData [No. 3,Ingeneral,or p-dimensionalable, he istingf he ellssgiven yforminghe roduct

    2 x 2 x ..x 2

    AnE infrontfsuch n expression eans hat heexpectations to be taken f all cellfrequenciesesignatedytheproduct.Finally, edefinehe r+ 1)x r+ 1)matrix ,as follows:1 -1 0 0 ... 00 1 -1 0 ... 0

    Ar~ ~ ~~-0 0

    Thismatrixormshefirstifferencesftheelementsfanycolumn ectorwhichtpre-multiplies.nparticular,f tpre-multipliesvector f umulativerobabilitiesuch s giveny(26) tyieldshe ategoryrobabilitiesf 27).Thetheoremsas follows:Theorem.Theexpectedellfrequenciesanbecomputedromheformula

    E 2 x 2 x ..x =N[A1 xA,2x ..xA,j

    E 1 Xr2Y 1) X-1Xrp1(Y)7EIrl(y) 72.2Y) 7rp'#(Y)

    (Notethat heabsence fthe" x" signbefore- on theright-handideimpliesmatrixmultiplicationfthe tandard ind.)

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    18/30

    1980] BARTHOLOMEW - FactorAnalysisorCategoricalData 309Proof.hemarginalrobabilitieshat n ndividualallsnto he ariousategoriesntheithdimension,iven , regiven y

    t1I(Y)A,,i i12(Y) (29)

    7tlr (Y)Thus he robabilityf allingnto ny ell f he able sobtainedymultiplyingogetherherelevant arginalrobabilitiesor hat ell since he vents re ndependent,iven ).This sachievedorll cells ogetheryformingheKroneckerroductf he ectors29) akenverinthe ameorder s inthe tatementfthe heorem.sing he tandardesult hat(Ax xBy x ..) = (A xBx ...)(x xyx...

    the esulthen ollows ntakingxpectationsith espectoy.Thetheoremdentifiesll the xpectationshich aveto be evaluated.n all there re(r1+ 1) r2 +)... (rp + 1)- 1 integrals to be calculated. The vector of expectations is thenconvertednto neof xpectedrequenciesypre-multiplyingy heKroneckerroductf hedifferencingatrices.hetheoremncludes,s a special ase, he P able btainedy ettingri= 1for ll i.They-scorendimension for ellx isgiven yJ1J1 1 J1 1 J1(y,5jIx)J...',p(y x) y Nf... J.p(xIy)dy/N ... Jp(xIy)dy. (30)Theoremprovides formulaor hedenominatorf his xpression.henumeratoran befoundymultiplyingheKroneckerroductfterhe byyvnd henvaluatinghe xpressionas before.Asan llustrationonsiderhe x3 caseused s anexampletthe eginningf he ection:

    1Eni x ir2 (y)] =-E: i22(y)

    r1(Y) 22(Y)and eachelementf hismust ecalculatedynumericalntegration.heresultingectorsnowpre-multipliedy1 -1 0 -1 1 0

    F 1 1?0 1 -1 0 -1 1A1XA2= [ 'j][Lo o 1=l -1 -1I 0~~~~~~~~~

    0 0 0 0 1 -10 0 0 0 0 1togive he xpectedellfrequencies.

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    19/30

    310 BARTHOLOMEW Factor Analysis orCategoricalData [No. 3,8. EXTENSIONS AND EVALUATION8.1. More thanoneLatentVariableThe methodsffittingheone-factorogitmodel xtendn a naturalway o the aseofseveral actors.heorem holds or ll q andhencenymethodor ittinghenormal actormodel o the ff-diagonallementsf covariance atrixanbeused oprovidepproximateestimatesor he 's.ThenewmethodivennSection.1 an lsobe usedn he ollowingay.Firstwefit he arameterso.1= (a1t 219 ..., p1) as already escribed.extweconstructheresiduals 0- ij &I. Thesewill ncludenegativeuantitieso signsmust echanged orenderthe ow otals ositive. secondet f arametersO.2 s then ittedothe esidualsndX is re-estimated.or example,n the aseofUpton's1978)data CaseIV,Table3) theresultingestimatesre

    A = (0O962,O351,546, 0-998,A493),&A2 = (-0-518,0-241, 0518, 0-520,-158),

    In = (0704, A454,469,0 703, 389).The stimatef isverylose othat or he ne-factorodel; hasbeen educedrom9x83o61x47 hichsstill poor it. n terativerocedureouldnow e used o mprovehe stimatesbut he easibilityf his emainsobeexplored.alculationsf ijforwo-factorodels imilartothose ivennTable1suggesthat he pproximationivenyTheorem s more estrictedin tsusefulnesshanwhen = 1.Furthernvestigations required.Withmore han wofactorsheprobitmodel ffersomputationaldvantages.his sbecause he xpectationsEni(y) j(y)} an always e reducedo bivariatentegralsowevermanyatent ariableshere re. n fact tmay asily e shown hat

    Eni(y)= 4(DQio), (31)rAfo rAjoEni(y)j(y) = { { (Z1, Z2; pij) z1dZ2 (i J), (32)where is the tandardivariateormal ensity ith orrelationoefficient

    qP j Y. &Aft,j (33)k= 1whereiik = aik/(l + yq I L.2+ (k= 0,1, ,..., ).The various stimation ethodsroposed yBock ndLieberman1970), hristofferson1975) nd Muthen1978) or hismodel rebasedessentiallyn 31)and 32).Their stimatesf JAw}an easily e convertednto stimatesor{aik} which,n urn, ould egood pproximationsothe orrespondingarametersf he ogitmodel.From hepoint fview f nterpretationndfor easons iven elowwe prefero usethelogit arameterizationorwhich

    zi = G (axo) G -o/(1 E h=Xih= /{Lh/(1 E Aih (34)

    8.2. Comparisons ith therMethodsIt snotunusualofind actornalysesarrieduton covariance atricesegardlessf heformf he istributionf hemanifestariables.orthe P ablet s perfectlyossibleotreatthe ndicatorariablesxi} ust ike ny thersnd operformfactornalysisn the stimated

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    20/30

    1980] BARTHOLOMEW - FactorAnalysis or CategoricalData 311covariancematrix.uch an analysis an be roughlynterpretedn terms four model,when he4C.1'sk = 1,2,..., ) are small, s follows. romTheorem2, a factormodel fitted o the off-diagonalelementswill stimate hequantitiesTG- `(xi0) cik,}k= 1,2, .., ). The variance fxiis G 1(ociO)1 - G 1(aiO)}+ O(x2). For the ogitfunction '(v)= G(v) 1 - G(v)} and hencethesamplevariances stimateG- 1'(aiO) nd thereforeheox'sre determined.t s doubtful hethersuch a procedure as any practicalvalue.We havealreadynoted hat he ogit nd probitmodels re ikely o give imilar umericalresults. t the onceptual evel here re considerable dvantages n developing othmodelsbymeansofthe atent tructurerguments sed here.The traditional factor nalysis" pproachassumes hat here re twotiers f atent ariables. irst here ssupposed o be a latent ariableunderlyingach dichotomy; positiveresponse s then observed f that variableexceeds athreshold alue.Secondly, hese ariables re related o the econd ier f atent ariables y heusual common actormodel.Thismaybe plausible n some applications utwith ichotomiesbased on house ownership,radeunionmembershipnd such ike, henotion f n underlyinglatentvariableand its associated threshold s somewhat rtificial.When we add to thistheargumenthat heformsf hedistributionsfthe atent ariables re essentiallyrbitraryheusual model ppearsas no more han convenient iction.t s for hese easons hatwe preferthe parameterizationn (34) whichhas a more robust nterpretation.Similar onsiderationspply o thehybridmodelofLord and Novick 1968) n whichG s alogit and H a probitfunction. amathanan and Blumenthal1978) have given maximumlikelihoodmethod or stimatingts parameters imilar o theEM algorithm. here s clearlyroomfor urthertudy f henumerical spects f ll models n the ight f urrent,nd the ikelyfuture,tateofcomputer echnology.It is unfortunatehatno other uitableresponse unction as come to light orwhich hevarious ntegrals ave simpleexplicit orms. f we are prepared o abandon the symmetryconditionswe could consider uch functionss

    ii(y) = y' or ii(y) = 1-(1-y)' (35)for = 1.These modelscan be fitted ery asilybut,withonlyone parameter, hey re notsufficientlylexible ormostpurposes. ntroducingurtherarameters uicklydestroys heirsimplicity.he approximatemethod of fittingur logitmodel seems to come nearesttocombining implicityndflexibility.hether r not t sgood enough o begenerallyseful orq >1 requires urthernvestigation.n themeantime hevariousmethods or heprobitmodelare available.ACKNOWLEDGEMENTSTheapproachon which his aper s basedwas first utlinedna paperread attheSociety'sconferencet OxfordnMarch1979. amgratefulo severalparticipantsor uggestionsndespecially o Dr J.A. Andersonwhose remarks ed to a major changeof direction. hesuggestions f referees nd otherreaders of an earlier version have also led to manyimprovements.he methodoffittinghe ogitmodel n Section6 has beenprogrammednFORTRAN byJ. Tomensonto whom owe specialdebt.

    REFERENCESAIGNER,D. J. and GOLDBERGER,A. S. (1977) (eds). Latent Variablesn Socio-economic odels.Amsterdam: orth-Holland.BOCK,R. D. (1972). Estimatingtemparametersnd latent bilitywhenresponses re scored ntwo ormorenominalcategories. sychometrika,7,29-51.BOCK,R. D. and LIEBERMAN,M. (1970).Fitting responsemodelfor dichotomouslycored tems. sychometrika,5,179-197.CHRISTOFFERSON, . (1975).Factor analysisofdichotomized ariables.Psychometrika,0, 5-32.FIELDING,A. 1977).Latent tructure odels. n The Analysis f urvey ata,Vol. 1:Exploring ata StructuresC. A.O'Muircheartaighnd C. Payne, ds),pp. 125-157,London: Wiley.

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    21/30

    312 Discussion fthePaperbyProfessor artholomew [No. 3,GOODMAN,L. A. (1978).Analyzing ualitative/Categoricalata Log-LinearModels and LatentStructure nalysis.Reading,Mass.:Addison-Wesley.HARMAN,. H. (1970). ModernFactorAnalysis, nded. Chicago:University fChicago Press.LAZARSFELD,. F. and HENRY,N.W. 1968). atenttructurenalysis.ewYork:Houghton ifflin.LOMBARD,H. L. and DOERING, C. R. (1947).Treatment f thefour-foldable by partialassociationand partialcorrelations itrelates o publichealthproblems. iometrics,, 123-128.LORD,F. M.andNovICK,M. R. 1968). tatisticalheoriesfMental est cores. eading,Mass.:Addison-Wesley.McDONALD,R. P. 1969). he ommonactornalysisfmulti-categoryata.Brit. .ofMath. nd tatist.sych.,2,165-175.MUTHtN, B. (1978).Contributions o factor nalysisofdichotomous ariables. sychometrika,3, 551-560.PLACKETT,. L. (1974). heAnalysisfCategoricalata.HighWycombe:riffin.SAMATHANAN,.andBLUMENTHAL,. (1978). he ogistic odelnd stimationfatenttructure..Amer.tatist.ss.,73,794-799.SOLOMON,. (1961).Classificationrocedures asedondichotomous esponse ectors.n Studiesn temAnalysisndPredictionH. Solomon, d.), pp. 177-186Stanford:tanford niversityress.UPTON,G.J.G. (1978). heAnalysisfCross-tabulatedata.London:Wiley.

    DISCUSSION OF PROFESSOR BARTHOLOMEW'S PAPERProfessorURRAY ITKINUniversityf ancaster):ampleased opropose he ote f hanksor avidBartholomew'saper. he ubject f atentariablemodelss of apidlyncreasingracticalmportanceandunifiesnumberf pparentlynconnectedtatisticalreas. rofessorartholomewotesnSection1that atent ariable odels avenot ound ide avour ith tatisticians,artlyecause f he ifficultyof ittinghemodels,nd hat ttentiono uchmodels y tatisticianssoverdue. ispaper onightakesanimportanttep owardsevelopingroperatent ariablemodels or ategoricalata.Thebasis f hemodelss set ut n Section. An arly istinctions made etweenontinuousnddiscreteatent ariables,hough he onditionalndependenceodel as beenusedwith oth ypes flatent ariable. iven hepropertiesiHvi) n section , the hoice fresponse unctionomes ownessentiallyoa probit/logithoice or oth hemanifestariablesndthe atent ariables.artholomewchoosesheogit/logitodel, or easons hichrenot ntirelylear. ections and5 discussndetailhefittingf hemodel. ere he ross-ratioslay n mportantole. heLSATxample iscussedhows hatthe omputingethod or he ogitmodel ives stimatesonsistentith heML stimatesor he robitmodel fBock ndLieberman.he methods practicable,hought s not lear hatt gives fficientestimates.Latent ariable odelsrenaturalandidatesorML stimationy he M lgorithmDempster,airdandRubin 977). he ogitmodelsunsuitableorM, ut he robit odelsveryuitable,sthe ufficientstatisticsn the completeata"model re ust he sual egressionums f quares ndcross-products.DLRpointedut hatML stimationn he ormalactor odelould e chievedy M singimpleack-and-forwardeastquares omputations,ndJohn inde asdevelopedt LancasterGENSTATacro orexploratoryactornalysis sing M.Hasselblad,tead ndCreason,n a note o appearnBiometrics,point utthat he tandard robit nalysismodel or dose-responseurve an befitted yEM.Thecombinationf hese wo pproachesllows he stimationf he arametersnthe robit/probitatentvariablemodel yMLusingnEM lgorithm.amcurrentlyompletingointworkwith arrell ock nthis rocedure.Professorartholomewotes hat ategoricalatent ariables ay etreatedy atentlass nalysis.Whileontinuousatent ariablesremore aturalhoices or bilitiesnd ttitudes,t sof omenterestthat he impleatentlassmodel lso providesscaling f he ell ntriesn a scalewhichs essentiallycontinuous.his s ofpracticalaluebecausehe omputationsor ittinghe atentlassmodel rethesame s those or generalmixture odel,nd caneasily e done nGLIM singnEM lgorithm.hecancer nowledgexampleSet 1) provides simplellustration.The atentlassmodel sed s a two-componentultinomialixture.here re wo lasses f eople:well-informed,ndbadly nformed.nProfessorartholomew'sotation, n 1) s a Bernoulliariable,withP(y= 1) = A,P(y = 0) = 1 A. The response unction f 5) is thenust

    7rAxi Y) =-Xi(l- - y) Xi

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    22/30

    1980] Discussion fthePaper byProfessor artholomew 313theysuffixndicatinghat hereretwo ets f i inthe wo atentlasses. he onditionalrobabilityfunctionfygiven in 2) isthen

    P(y=1I x) = >f (x Iy = 1)1_i(xIY=?'a monotone unction f the ikelihood atio for hetwocomponents,nd similarlyorP(y= 0 1 ).The EM algorithm eginswith tarting aluesfor heprobabilitiesf atent lassmembership, ostsimply y ssigningach cell to oneof he wo lasses.Parameterstimatesrethen btained ntheM-stepfrom heconditional ndependencemodel. Theseare substitutedntothe ikelihood atio to givenewprobabilities f lassmembershipntheE-step. hesequence f teps ontinues ill onvergenceotheMLestimates fthe iy. This is very imply ccomplishednGLIM with small macro.Atconvergence,heprobabilitiesf atent lassmembershipavealsoconverged,ndthese lsoprovide rankingf ellsfrom"mostwell nformed"o"leastwell nformed."naddition, or he onditionalndependence odel, he ogoftheratiooftheprobabilities fclassmemberships a linear unction fthexi-a lineardiscriminantfunction,hose oefficientsre the og-odds atios or he th tem. hisdiscriminantunctionanalso beusedtoscalethe ndividual ells.A smallorzerocoefficientndicates hat he orrespondingtem oesnotdiscriminateetweenwelland poorly nformedlasses, nd can be droppedfrom hescale.In the cancerdata, the two-classmodel gives a goodness-of-fitalue A of 154, usingone extraparameter,o itfits s wellas theone-factormodel.The discriminantunctions

    143x1+ 362x2 2 35x3+ 1 61x4assigningmostweight o newspapers, extto solid reading, nd least to radio and lectures. hesecoefficientsre very imilar n relativemagnitude o thefactor oadings n Table 3, column2. Thediscriminantcore, nd the stimated robabilityfbelonging o thewell-informedroup, re shown nTable Dl, together ith hecellcode and theestimated actor core from able 4.

    TABLE DI ProbabilityFactor Discriminant of beingCell (y)score score well-informed0000 0 212 000 00290001 0 304 161 0 1310010 0 384 2 35 0 2410011 0475 396 06130100 0 522 3 62 0 5310101 0 615 5 23 0 8500110 0700 5.97 09220111 0-797 758 09831000 0304 143 0 1121001 0394 304 03861010 0475 3.78 05691011 0 567 5 39 0 8681100 0615 505 08251101 0711 666 09591110 0796 740 09801111 0889 901 0996A plotofthefactor coreagainstthediscriminantcoreshows a verynearly inearrelationship.Similar esultsreobtained or he SATdata ofBockandLieberman. orSection , thevalueofAforthemixturemodelafter 2 iterations s 239,with discriminantunctionf

    166x1+ 148x2+ 191x3+ 1.32x4+ 126x5.For Section7,A is 35 5 after 2 iterationswithdiscriminantunction

    176xi+2 02x2+2 67x3+ 148x4+ 137x5.Thegoodness-of-fitfbothmodels anbeimproved yfurtherterations, ithoutssentiallyhanginghediscriminantunction.

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    23/30

    314 Discussion f thePaperby Professor artholomew [No. 3,I referredt thebeginningfthese ommentso thevalueof atent ariablemodelsnunifyingapparentlynconnectedreas. should iketo concludewith n example: egressionn principalcomponents.t is common ractice,rat least commonlydvocated ractice,oextractrincipalvariablesromhighlyorrelatedet f redictors,nd henegresshe esponsen suitableubsetf heprincipalariables. atentariable odel-theMIMICodelfJoreskogndGoldberger1975)-makesclearwhat rincipalomponentegressionstryingo chieve.nderlyinghe et f redictorss set flatentariables. Given,the esponse s ndependentf anddependsn throughregressionodel.Thex areconditionallyndependentiven:

    Yi|ziN1(y + zi, T2) independentlyxi i -N(O + Azi, ) independentlyfyiwithP a diagonalmatrix, hilemarginallyi N(O, ).Themodel anagain e fittedyML singnEMlgorithm,nd GENSTATrogramor his sunderdevelopmentyJohn inde.Inconclusion,havemuch leasurenproposinghe ote f hanksorhistimulatingnd mportantpaper.Dr A. M.SKENEUniversityfNottingham):rofessorartholomewasdescribednapproacho heanalysisf rdinal atawhichs very elcomedditiono he atherparse ody f heoryhichxiststpresentnthis rea.Thefullmpactmust f oursewait utureevelopmentshicheleaseome f hecomputationalonstraintsmposedy he resentstimationrocedure.he estrictionoone rperhapstwoatentariables ust ppear obe veryevereestrictiono hose ccustomedousing ormalheoryfactornalysis.owever,cceptinghisimitationnd ware f he rbitaryaturef oth (y)nd7t(xy)I looked opossible sesof hisogitatenttructureodel.There retwo reas f pplication; odellingnddatareduction.Asingleontinuousatentariablemay rovideveryood xplanationorhe atternbservednmultidimensionalontingencyable, articularlyf herereextra-statisticalrgumentso supportheexistencef uch variable. n the ther and,atentariablesend o be mentalonstructsndthusequally alid rgumentsanusually e madefor discreteormulation.heflexibilityf atentlass

    analysiss describedyGoodman1978) uggestshat hediscreteormulationhould e our tartingpoint, ith he ogitmodel eing doptedwhen he atentlass nalysisevealslasses aving clearordering.The ogitmodelffectsatareductionyreplacingbyE(y ), he actorcores.t followsromheconditionalndependencef hemanifestariablesiven , hatt sa trivial atterocalculate(y ())where (l) s any ubvectorfx.This, oupledwith hefact hat arameterstimationnly equiresknowledgef he ne nd wowaymarginsf hemanifestariables,eads o he bservationhathe ogitmodel'spplicabilitysunaffectedymissingata.Once (y )hasbeen btainedowever,tsmeanmay etotallynappropriates a summaryeasure.Fig.D1displayswonstancesf (y )for he ombardndDoeringata.Theproblems that f inding

    20 2.0

    00 yscore 1.0 0.0 y score 1.0FIG.Dl. Conditionalistributionsor ombardndDoering ata. i) p(yXi= (0,0,0,0)).ii)p(yx' = (0,1,1,0)).

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    24/30

    1980] Discussion fthePaperby Professor artholomew 315suitableummary easuresor eavilykewed istributions.he hoice f ummarytatisticsevenmorecomplicatedhen wo atent ariablesrefittednd t scertainlyangerousomakemuch ere f heanalogywith ormal actornalysis.Effectiveata reductions striking balancebetween educingimensionnd retaininghatinformationhichs relevanto a specificbjective.he realvalueoffactor cores rconditionaldistributionsannot e udgednthe bstractnd t spointlessebatinghemeaningf heseuantities.Their ltimatealuemust e udged y, or xample,he ccuracyf he inal redictivequation rtheinsightsainednto he ubject f he nalysis.This ointsrelevantoall atenttructureodels nd anbe llustratedy he ollowing odel sedformedical iagnosis.GiveD iseases i, = 1.I and ymptomector,onepossibleormulationfp(S i) sthe atentclassmodel

    n Kp(S IDi) = E H Pk(SkICj)p(Cj IDi). (1)j= 1 k= 1Conditional pon latent lass,Ci, we assume hatthesymptomsre mutuallyndependentndindependent fDi. The parameters f thismodel,viz. theparameters fPk(. - and theprobabilitiesp(Cj i)j = 1.n; i = 1.I canbeestimatedsingrainingata, kene1978),nd, iven particularrealisationfS, sayT,disease robabilities(Di T)oc(p(TDi)p(Di), anbecomputed.Equation makesbsolutelyoclaimobea representationf he ruth.nany articularpplicationitstands rfalls y ts bility ocorrectlyiagnose atients.There s a secondway fwritinghismodel.Given , wemay irstalculate

    p(Cj I )oc H k(TkI ) p(C)kwherep(ci) = E p(cj jDi)p(Di)

    andthen alculatep(Di T) = E p(Di Cj) p(Cj jT).

    Theprobabilities(Cj T)j = 1.n defineprobabilityistributionver heatentlasses nd hisloneisused ncalculatinghedisease robablities.This articularormulationakes he wo teps f he lassificationuchlearer.he irsttep f atareductionsfollowedy he singf he ransformedata.Howeverhis ormulationsalsoveryeductiveas t xposesheatentlassesndraises he ossibilityhatheymightave ealmeaning.uch mphasisis, nthemain, nwarranted.Professorartholomew,neffect,as describedrather ifferentay fdoing his irsttep fdatareduction.heultimateestf his articular odelswhetherffectiveredictionsrgoodunderstandingofparticularata sets esult.I havemuch leasurensecondinghevote f hanks.The vote f hanks aspassed y cclamation.MrC.J. KINNER Universityf outhampton):shouldlso ike o hank rofessorartholomeworverynterestingaper. particularlynjoyedhe iscussionsf he esponseunctionnSection andnote hat ertainatent lassmodelsmay lsobeincludednthegeneral ormulationf quation6),ifH becomes discrete alued unction.Mymain ommentsoncern he uggestionnthis aper hat he ogitmodel spreferableothenumericallyimilarrobit odel,nd shouldike oofferfew ordsndefencef he robit odel. nereason ivenor referringheogitmodelsthat ectionprovidessimplepproximateolutionat eastwhen = 1).This olution ay, owever,evieweds an teratednaloguef he impleeuristicolutionfor he robitmodel, here he stimatedij's orrespondothe etrachoricorrelations.nfact,f neattemptso terateheheuristic ethodna correspondinganner,nefindshat uccessiveterationsgive n denticalolution,ecause,nderhe robitarameterisation,ni(y) oesnot ependnthe actorloadings nd thecorrespondingij's reall unity. ne advantage f such non-iteratedwo-stage

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    25/30

    316 Discussion f thePaper by Professor artholomew [No. 3,procedures that onventionalactor nalysis ackages r more ophisticatedorrelationtructurepackagesuch s LISRELJoreskogndSorbom, 978)maybe usedwith ategoricalariables r withcombinationsf ategoricalnd continuousariables.Availablempiricalvidence,s in Table2, suggestshat oint stimatesbtainedytheheuristicprocedurereverylose othe ullmaximumikelihoodstimates.supposedroblem ith he euristicprocedure,s for xampletated yMuthen1978), s that fobtaining statisticalest fmodel it.However,chi-squaredoodnessf itestmay e obtainedydirectnalogyo Section. of his aper,wherehe omputationf he est tatisticequireshe valuationf numberfmultivariateormalintegrals.erhaps more ifficultroblemith he euristicethodsthat f btainingtandardrrors.As Professoritken asnoted,his roblems not eferredo n his aper,nd twould e nterestingoknow f hemethodnSection can besimplydapted o give tandardrrors.Finally,findn eachinghis ubjecthathe robit odel rovidesneasily nderstoododificationofnormal heoryactornalysis,nd findtvaluable odemonstrateimpleinks etweenontingencytable nalysisnd continuousariablenalysis.

    Dr G. J.G.UPTONUniversityfEssex): rofessorartholomewas etup an elegantmathematicalsuperstructurehich illsme, t east,withwe and wonder.will hereforeonfine y ommentsoadiscussionf he esultshat e hasobtained.Table includeshy-scoresor he ombardndDoering ata,whichave pparentlyeen btainedbynumericalntegrationf 10).These cores re,however,implyonnectedo the stimatesf heparameters.sing ubscripts, , k and 1, ach aking alues or1correspondingothe elldefinition,note hat nexcellentit o they-scoress given y4 895yijki = 1 024+Oei+a2i+a3k+a4l.I havebeenunable ofind n explanationor his quation,ut he its too good tobe accidental.I am unhappybout he cant ttentionaidbyProfessorartholomewo thenterpretationfhisresults.nparticular,cannot elievehatmy ookhasbeen o widelyead hatt s unnecessaryodefinethemanifestimensionsor ata et V.These rereferendumotesforr gainstntrynto he ommonMarket),oliticalllegiancen1975,mount f chooling,nionmembershipnd ocial lass. heorder

    of ategoriesf nionmemberships the eversef he rder ivennmy ook. hepositivee-valueslacetheminimally-schooledorking-classnti-Commonarket abour nionmembertthe eft and ndof he oliticalxis,whichsreasonable. owever,t sdistinctlyurprisinghathe trongestontrastsrethose elatedo referendumote ndunionmembershipatherhanmanifestoliticalllegiance.InsectionProfessorartholomewitssecondatent actoro he eferendumata. orthis actortistheminimally-schooledon-union iddle-classnti-Commonarket onservativeho s at one ndof n axis.Could his e anagedimension?owever,amunhappybout he ssumptionhat neof hetwo atentimensionsnthe wo-factorodelwill fnecessitye the imensionoundn the ne-factormodel. would ave elthat,n casewherehere ere eallywo imensions,he ingleimensionoundbyfittinghe ne-factorodel s moreikelyo be an over-workedybridying etweenhe wo.Myhypothesisould e testedy reatingdata etwhich asderived,ithoutandomariation,romwolatent ariables,nd then ittinghe ingleatent ariablemodel.MrG.J.A.STERN(I.C.L.):any ranchesf cienceeemofollowhe oursef osmologicalheoriessuggestedn the ines, hichonsist f couplet yPopeand a modernequel:Nature,nd Nature'saws ayhid nnight,Godsaid, LetNewton e," nd all was ight.It didnot ast, heDevil houtingHoLetEinstein e " restoredhe tatus uo.Darknesss followedya clarifyingheory, hich s followedyworse arknessn theshapeofsophisticationeadingo ncomprehensibilityndpossibly eaninglessness,o that larificationollowsanupside-down-curve.It seems o me that pearman's meant omething: anymeasures f ntelligenceerehighlycorrelatedoa single actor.ikewise,would uggesthat inear ombinationsf he riginalariates,wherehe ombinationas meaning,s s oftenhe asewithrincipalomponents,eansomething.tis hardero eewhat he ull actor odelmeans, ith llsorts fnon-orthogonalitynd hempossibilityof xpressinghe actorsntermsf he riginalariatesexcept yestimation).

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    26/30

    1980] Discussion f thePaper by Professor artholomew 317If factornalysisype f heory ere obeapplied t all, t hould e, suggest,oprecise atawheretherere many eadingso that recisestimatesfparametersan be made. would uggesthat hecancer ata, orxample,s far romhat. anpeople eally ecallccuratelyrom hichombinationfradio, apers tcthey ot heir ancer nowledgerom?I suggesthatnotheramplewould ield ifferentnswers,hich ould reatlylterhe stimatesfthe arameters.oreover,ven he ne-factorodels fittingight arametersnd variateowhat rereallyixteenoints,nd two actor odelwould eworsetilln eadingo stimatesithas believe)huge ariance.I don't nowwhat he nswers, ven fter laying ith he ataofTable ,butwould enturehesesuggestions:(1) nfact ven hendependenceitsnot llthat ad havingegardothemprecisionf he ata.Atmost, suggest, slightmodificationf his ssumptions needed.(2) More han ne factorhould otbeconsideredor he bovereasons.(3) Possiblyhemodelwould emoreonvincingf he had physical eaning,erhapselatedo hecorrelationetweenhe nswersothe uestions.(4) fparametersswell s the reneeded,would uggesthat he 's should ot eused ut nly hei's. Inconclusion,suggesthatwithocialmultivariateatawe reoftenryingoexplain,ommentn,look t, athermpreciseiguresn waywhichdds opeople's nderstandingfwhathe ata ssaying.Has this een chievedere? thinkhat uite fewssumptionsave een uiltntothe heoryhoseimplicationshe ser f he heory ill ftenot ullyomprehend,nd oitwill e hard or he ser oknowwhat asbeen chieved,nmany ases. ertainlythinkhiss core nwhichclearernd implertheoryouldbebuilt,nd wouldhope hat hiswill e done.Theforegoingxpresses y wnview.Dr P. M.E.ALTHAMCambridgeniversity):would ike o hankrofessorartholomewor usefulandstimulatingaper,nd make wobriefoints.(i) Although shouldperhaps hink urtherboutDr. Upton's ommentsefore speak,myimpressions that would indtnot oohard o nterprethe atenttructureodelso social cientist,andcertainlyotharder hannterpretingloglinear odelwithomplicatedigh rdernteractions.(ii) Latenttructureodels ossess he ollowingeature hich findttractive.he ssentialeatureof hemodelsthat or he bservableariables1 ..xp,whichregenerallyiscrete,epostulateheexistencef he atent ariables, uch hat iven , 1 ..xp re ndependent.hus heoint istributionfany ubset fx1 ..xphasthe ame tructures that fx1 ..xp,. hisseems desirableropertynapplicationsherehe umberf bservable'smay ot everylearlyefined;he ocial cientistouldprobably ant o ncluderexclude xtra 'sor questions" ithoutrasticallylteringismodel. his"invariance"eaturesnot hared y oglinearnalysis,lthoughf oursetmust e recognisedhatloglinearnd atenttructurenalysesre ddressingatherifferentroblems,utfor he ame ypefdata.Aconsequencef his roperty,spointedut lready yDrSkene,s that rofessorartholomew'slogitmodels unaffected"ymissingata; onlywish oput he ositivedvantagesf he ropertyorestrongly.Thefollowingontributionsere eceivednwriting,fterhemeeting.Professor.B.ANDERSENUniversityfCopenhagen):thasbeen erytimulatingoreadProfessorBartholomew'sewunifyingpproacho latenttructurenalysis.hekey ssue s,of ourse, ow omodeln atentpace.Althoughrofessorartholomewrgues eryorcefullyorlways avinguniformdistributionf heatentariable,t s mportantonote,hatmanyrgumentsemandhatwe onsiderlatent istributionsith arameters.emay hus e nterestedn omparingeveralatent istributionsor wemay e nterestednchangesn latentariablever ime.nsuch ases statisticalnalysis illusuallyake he ormf comparisonf he arametersf ifferentatentistributions.neof hemodelsmentionedyProfessorartholomew-witha logit nd H a probit-was onsideredn a paper yAndersennd Madsen1977) ndithasrecentlyeen xtendedo cover hetype fcomparsonsmentionedbove Andersen1980)).fonecompareshe pproach fProfessorartholomewith he

    resultsustmentioned,t ppearshat he -parametersfmodel6)play ifferentoles. ome f hemreparametersonnected ith hemanifestariablesnd ome f hemelatemore o he atentariables.oran interpretationf heresultsf nanalysistmay e worth he fforto make uch distinction.

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    27/30

    318 Discussion f thePaperbyProfessor artholomew [No. 3,As nexample,fwe onsiderheG-logit,-probit odelwith ne atentariable,j0will ombineheitemarametersf he impleogistictemharacteristicurvemodelorRaschmodel) nd hemean f helatent ariable, hile il is a constantnd equalto the tandardeviationf he atent ariable.Dr J.A.ANDERSONUniversityfNewcastleponTyne): rofessora.holomew'saper sdoublywelcome ecausetcombineswo mportantopics,multivariateategoricalata analysisndfactoranalysis.ispaper rovidesvery elpfulummaryfnecessaryropertiesor actor odelsnd ontainsthenterestingesultsn the xpectationfRii. agree ntirelyhat hese actor odelsre usefulndimportantut still etainomepreferenceor heprobitmodel.The methodfestimationuggestederes to fit he ne-waymarginsxactlyndto optimizemeasuref oodnessf it f he wo-way argins.similarpproach asbeen stablishedor he robitmodel yMackenzie1976),with he dvantageshati) the quation25) s exact nd ii)maximumlikelihoodstimationor he(aij)conditionaln the stimatesxi) s feasibleormanymore imensionsthan = 10; hese anbe showno beasymptoticallyfficientnd tandardrrorsanbe derived.ockandLiebeman's1970) imitation,< 12, efersothe imultaneousstimationf xi) nd(ceij)ndmay nany ase besupercededybetter ethodsfoptimization.A more undamentaloncernbout he ogitmodel, hen heresmore han nefactor,elates orotatability.ncontinuousactornalysismodelswith,ay, factors,t s possiblenly o estimatek-dimensionalactorpace Lawley Maxwell,971). he hoice f actors ithinhis pace s determinedsubjectivelyrby xternalriteria.heprobitmodel or ategoricalactornalysisasexactlyhe ameproperty.otationf he actorpace orrespondsorotationf he actoroadings. owever,he ogitmodel ppears ot obe rotatablesunlikehe ormalase, ndependent,omoscedasticogistic,ariatesarenot nvariantnder otation.ince heprobitnd ogitmodels reso close n other espects,amconcernedhat heres an approximateotatabilitynthe ogitmodelwhichwould ead tonumericalinstabilitynthe stimationrocedurenless ecognizednddealtwiths inthe ontinuousactorase.This fieldhas been neglectedn the statisticaliteraturend Professorartholomews to becongratulatedoth nhisresultsndonstimulatingur nterest.MrC. L.F.ATTFELD (UniversityfBristol):foundhe aper articularlytimulatings the opicsonewith hichamnot ltogetheramiliarnd o the aperervedo ntroducee o he reviousttemptsto olve he roblem,hichrofessorartholomewhows,anbeviewedsspecialcasesfhismore eneralapproach. wasimpressedythe ccuracyf theparameterstimatesbtained ythe logit irstapproximation"ethodor he nefactorasewhich,rofessorartholomewtates,anbeobtainednpocketalculator.heapproximationhould rove n invaluableool n thepreliminarynalysisfcategoricalata.Itwould e interestingo seetheresultfrelaxinghe ssumptionf ndependencef the atentvariables.nworkingith nobservableconomicariablesfindndependenceery ifficulto ustify.Wouldtbepossibleoworkhroughhe nalysis ithoutmposinghendependenceonditionnd henconstructtest orndependence?Iwould rguewith rofessorartholomew'semarknSection thatatentariables hichre real",i.e. an nprinciplee measuredirectlyuch s "personal ealth",requite are. nthe ontraryostlatentariablesneconomicheoryreof xactlyhis orm,.g.permanentncome,he xpectedate finflation,nticipatednvestment.t strue hatnthemajorityf hese ases heres noproblemecausethe atentariablesanbe ssociated ithmanifestariableshichariablesor transformationf hem)canbe assumedo be continuousnddistributeds multivariateormal. he models an then eestimatedsing heGLS procedureue toBrown1974) rthemaximumikelihood ethod uetoJoreskogutlinednJoreskogndSorbom1974).Professor.GoLDsTEIN(Universityf ondon):found rofessorartholomew'sapernterestingnduseful,ut am a littleuzzled y hemportanceeattachesoassumptioniv) n Section .this eemsto me o ead oan unnecessarilyestrictiveet fmodels nd fail oseewhy,ngeneral,- i(y)shouldbelong o the samefamilyffunctionss i(y). For example,or n examinationuestionwithcorrect/incorrectesponsenewouldnormallyxpectn incorrectesponse,aytoa multiplehoiceq.uestion,o beobtainedyway fdifferentental rocessesoa correctesponse,nd wouldnotthereforexpectiv) o hold. have rguedlsewhereGoldstein,980) hat he omplementaryog ogfunctionwhichoesnot atisfyiv) ut oes,ncidentally,atisfyii))may e n ome ircumstancesmoreappropriatene than the ogitor probit or xamtypedata.Whilst wouldacceptProfessor

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    28/30

    1980] Discussion f thePaper by Professor artholomew 319Bartholomew'sustificationf iv)formany inds f ata, twould eem nfortunatef twere obe usedwheremore ealisticunctionsre available.On the topicof conditionalndependencereferredo as "local ndependence"n much f thepsychometriciterature),rofessorartholomewssertsnSection .2thatf onditionalndependencewere ot rue,henhis mplieshat ome theratent ariable as xertingn nfluencen themanifestvariables. am not sure agree. upposewe havedeterminedhe atent pace, ndchoose setofindividualsta single ointn hispace,hats, llhavinghe ame et f alues nthe atent ariables.fwe onsider2P able f esponseshen onditionalndependenceeans hat he esponserobabilitiesnthis able ependnly nthemarginsf he ablend hroughhese ntheatent ariables.his, owever,seems obe ratherstrongssumption,nd ven f onditionalndependenceidnothold,wemighttillbe ableto relate he ppropriateinteraction"robabilitiesn this able, ia additional arameters(loadings),othe ame et f atent ariables.npractice e ould resumablyttempthis nlyfwehadindependenteplicatebservationsn ndividuals,lthoughhissa difficulthingo chieven he ocialsciences.hedimensionalityf the atentpace s concerned ith hebetween-individualariation,whereashewithin-individualependenciesill etermineowmany arametersre ssociated ith achlatent ariable.I would ike to endorse tronglyhatProfessor artholomewaysabout the care neededninterpretingesultsromatent ariablemodels.hehistoryf actornalysiseemsobe full fwhat reessentiallyathematicallyonvenienteviceseingonfusedithubstantiveeality.f ourse,newayof nculcatingproperautionsif ne canshow hat easonableut ifferentodels,ncludingomewhichonot atifyiv), an ead oquite ifferentnterpretationsf commonata et. verymuch opethat rofessorartholomewillgive ome urtherhoughto this ssue.TheAUTHORepliedater,n writing,s follows.The iscussionasraisedmany undamentaluestions hichmeritmore xtendedeplyhan he resentlimit ntime nd pace llows. ammost ratefulo hosewho ubmittedontributionsnd he ollowingincompleteemarksre ntendeds firstontributiono whatwill hopebe a continuingiscussion.Severalpeakersave rawnttentiono he ossibilityf sing latentlassmodelnwhichhe atentvariablescategorical.s MrSkinnerointedut, uch modelwith wo atentlassesrises s a specialcase ofourgeneralmodel. hus upposewe choose;i(y) =;i , ? < Y Y.

    =i2' Yo

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    29/30

    320 Discussion f thePaper by Professor artholomew [No. 3,easily pplicables the robits now. he implicityf he pproximateethodf ittinghe nefactorlogitmodel lready romises ell.A numberf uestions ere aised bout he hoice f (y) nd (x I ).However,he ointmade n hepaper bout he ssentialrbitrarinessnvolvedn this hoice oes not eem ohavebeen ullyaken.Another ay fmakinghe oints toobservehatny hangefvariablenthentegralf1) eavesf(x)unchanged.incef(x)s llthat anbeestimatedheresno mpiricaleansf istinguishingetweennyof he ombinationsffunctionshicheadtoa given (x). t s thus ointlessoargue bout hemostrealisticormf (y); he uestionspurelyneof onvenience.y he ame oken,nalyses hichependinan essential ay nanyparticularormave ittlemeaning.or this eason amdoubtfulbout heconsequencesfDr E. B. Andersen'sroposal omake heparametersfp(y) unctionsf ime.Severalpeakers entionedirectlyrby mplicationheack f tandardrrorsndthe isregardfquestionsfefficiency.his omission asnotentirelyue to lackofspacebut arises npartfrommisgivingsbout he elevancef uch onceptsn n analysis hichs exploratorynddescriptive.heoverall oodnessf it estscertainlyseful.owmuch tandardrrorsontributeo henterpretationfthe nalysisverndabove hateemsmoreuestionable.he oot f hematteras odo with ow artis ensibleo onsiderepeatedamplingrompopulation hich,n certainense, asnoreal xistence.DrSkene, rJ.A.AndersonndDrUpton llraisemattersoncerninghe aseq> 1.Therewasnospace odevelophisnthepaper ndtheir emarksavebeennoted or uturese.For small 'sthenormalheoryarriesver yTheorembut hiss carelydequate or eneralse.Theprobit odel asreal dvantages ere ndthe losenessf he ogitndprobitmodelsmeans hat he stimatesftheparametersfonecaneasily etransformedogive stimatesf he ther.No one ommentednSectionwhichhows ow nymodel f he amilyanbefittedotableswithseveralrdered anifestategories.his ringsmuch iderangef atawithinhecope f henalysisandfurtherorksinprogresso mplementhemethod.Apart rom hese eneralssues numberf pecificointswere aised nd these redealtwith,norder,elow.Professoritkin'sxamplenvolvinghe seofprincipalomponentsnregressionsa good nstanceof he enefitsf ettingatent ariableroblemsn generalramework.isexamples used o llustratethe amepointnBartholomew1981).Dr Skene srightopoint ut hat hemeanmay otbe a goodmeasuref ocationorhe osteriordistributionut he emarkas obe nterpretedn heightf he rbitrarinessf (y). y hoosing obeuniforme, nsuredhat (y )had a "distribution-free"nterpretations the xpecteduantilef nindividualhosen t randomromhosewith given .Mr Skinner'semarksbout he elationshipetweenheprobitnd ogit stimation ethodsreilluminating.would art ompany ith im n eeingt sa virtuef he robit odel hatt anbeeasilylinked ith henormalmodel ffactornalysis.orteachingurposesthinkhat his bscures atherthan evealshe ommonnderlyingtructurehared y ll latent ariablemodels.DrUpton'simpleormulaor hey-scoress ntriguingutnot ltogetherurprising.t s anotherindicationf he ommontructurehichmerges,orxample,n heorem.For ufficientlymall 'sthey-scores ill atisfynequation f heform

    pAy = B+ oeixii=l1It s urprisinghat his orms o goodwhenhe 'sarenot mall. his xamplencourageshe ope ffindinggeneralinearpproximationhich ould void heneed oevaluate hentegralsequiredorthey-scores.share isregrett the cantttentionaid nthe aper o thenterpretationf he esultswhich,gain,was olelyue o ack f pace. amgratefuloDrUpton ndotherontributorsor elpingtoremedyhedeficiency.As lways, rSternuts son ourguardgainst ndueophistication.n this asehowever,woulddisputellofhis onclusions.he tudy f ancer nowledge asnot oncerned ithsking eopleboutthe ource ftheir nowledgefcancer. hepoint f tryingofind atent ariables hich ouldbeidentifiedithKnowledgebout hingsngeneral" as o seehow his elatedoknowledgef ancer.DrsAlthamndSkene rewttentiono n mportantropertyf hemodelwhichsextremelysefulin ocial pplicationsot east n he egressionroblemeferredobyProfessoritkin.here reusuallya greatmany ossiblemanifestariablesrom hich hosencludedn the tudy reoftenhosennan

  • 8/14/2019 Factor Analysis Discrete Data 2985165

    30/30

    1980] Discussion f thePaper by Professor artholomew 321arbitraryashion.sthese ontributorsmply,hemannerf hishoice oesnot nvalidatehemodel.t sfor this reasonthat the "naive" nterpretationf linear ombinationsf manifestariables srecommendedyMrSternmay e less imple han ppears tfirstight.I was nterestedo hear rom r J.A. AndersonboutMackenzie's ork nd would ommendttoProfessorsitkinndBock ince t would e interestingo see how t compares ith heirmethod.MrAttfieldaisedwomattershicheserve orettentionhanspossibleere. mongconomists,I suspect,atent ariablesuch s those e mentionsrefirmlystablishedntheanguagef he heory.Theywill ften e correlatednd t snaturalowant oexpresshe nalysisnterms hich ill elateoeconomicheory.his anbedone,nprinciple,or hemodels iven ere ut would referoworkwithorthogonalatent ariablestthe irsttagend hentransformhemubsequentlyodimensionshichreeconomically eaningful.n thequestion f"real" atent ariables do notthink hat heres anysubstantialifferenceetween s. defined real atent ariable s onewhich ould, n principle,emeasured.thinkMrAttfieldegardsvariablesreal f conomistselievethasmeaningn conomicdiscourse.he twodefinitionsrenot quivalent.Professoroldstein'soints re, ikewise,undamental.hateverurdifferencesfullyndorse ispenultimateentence.t is certainlyossibleoconceivef situationnwhich (y)doesnot atisfyassumptioniv)of Section . Thequestiont issuemaybe putas follows: oes the abellingf thecategoriesonveyny nformationhichsrelevanto he nalysis?fnot henonditioniv) ollows.venif his rguments not cceptedherere ubstantialmpiricalroundsor doptingiv). upposewe akesome imple unction,ot atisfyingiv), uch s m(y) y'.Then hefunction (y) 1 (1 - y)amightappear o servequallywell.Which ormhouldwe hoose?With dimensionso the able herere2Ppossibleombinations.t wouldbe a formidableask o investigatell ofthemnd choose hebest.Invokingonditioniv) educeshe ptionsoone. twedecideouse functionot atisfyingiv)wehavetofindome xtraneousroundsor referringparticularorm.nview f llthatwehave aid bout hearbitrarinessnvolvedhis eems obea formidableask.The crucial ssumptionfconditionalndependences morenthenature f definitionhan nassumption.t s a formaltatementfwhatwemeanwhenwesay hat hevariationmong hex's scompletelyxplainedy heirependencenthe 's.f heatent ariablesre onstructs,he ssumptioncould ever etestedmpiricallynd o tdoesnot eemppropriateo peak f t n ermsf eing ruerfalse. heresclearly uchmore obe aid nthis,ndmanytherf he ssues aised,nd hope hat hedebatewill ontinue.

    REFERENCES N THE DISCUSSIONANDERSEN,. B. (1980). Comparing atentdistributions.sychometrika,5, 121-134.ANDERSEN, B. and MADSEN,M. (1977). Estimating he parametersof the latent population distribution.Psychometrika,2, 357-374.BARTHOLOMEW,. J. 1981). Posterior nalysisof the factormodel.Brit.J. Math. Statist. sychol., o appear.BROWNE, . W. 1974).Generalized east quares stimatorsnthe nalysis f ovariance tructures.th.Afr. tatist. .,8,1-24. ReprintednLatentVariablesn ocio-economic odels D. J.Aignernd A. S.Goldberger,ds),pp.205-226.Amsterdam: orth-Holland.DEMPSTER, . P., LAIRD,N. M. and RUBIN, . B. (1977). Maximum ikelihoodfrom ncomplete ata via the EMalgorithmwithDiscussion).J. R. Statist. oc. B, 39, 1-38.GOLDSTEIN,. (1980). Dimensionality, ias, ndependencend measurementcaleproblemsn latent rait est coremodels.Brit.J. Math. Statist. sychol.,n press.HASSELBLAD,., STEAD, . G. and CREASON,.P. (1980). Maximum ikelihood stimation ormultiple robit nalysis.Biometrics,o appear.JORESKOG,. G. andGOLDBERGER,. S. 1975).Estimationf modelwithmultiplendicatorsndmultiple auses ofsingle atentvariable.J. Amer. tatist.Ass., 70, 631-639.JORESKOG,. G. and SORBOM,. (1974).Statisticalmodels nd methods or nalysis f ongitudinal ata. In LatentVariables n SocioeconomicModels. D. J. Aigner nd A. S. Goldberger, ds), pp. 285-325.Amsterdam: orthHolland.(1978).LISREL IV. Ageneral omputer rogram or stimation f inear tructural odelsbymaximumiklihoodmethod.User's guide.Dept of Statistics, niversityf Uppsala.LAWLEY, . N. and MAXWELL,. E. (1971).FactorAnalysis s a StatisticalMethod, nd Ed. London: Butterworth.MCKENZIE, .R. (1976). Some statisticaldentificationroblems. h.D. Thesis,OxfordUniversity.SKENE, . M. (1978). Discrimination sing atent tructure odels. n COMPSTAT1978 (Corsten nd Hermans, ds).Vienna: Physica-Verlag.