Exploratory Factor Analysis: Model Selection and ...
Transcript of Exploratory Factor Analysis: Model Selection and ...
EXPLORATORYFACTORANALYSIS:MODELSELECTIONANDIDENTIFYINGUNDERLYINGSYMPTOMS
By
MatthewK.Cole
AthesissubmittedtoJohnsHopkinsUniversityinconformitywiththerequirementsforthedegreeofMasterofScience.
Baltimore,Maryland
September,2017
ii
Abstract:
Exploratoryfactoranalysis(EFA)isacommonyetpowerfultooltobetterunderstand
thetheoreticalstructureofasetofvariables.AcoreproblemofconductinganEFAis
determiningthenumberoffactors(m)toextractandexamine.Inthisthesis,weexaminedthe
performanceofexistingmethodsofestimatingmwhileproposingandassessingacross
validatedmethodforestimatingmacrossvarioussettings.Thesemethodswerethen
consideredinastudyincorporatingEFAtoassesstherelationshipandcategorizationofself-
reportedchronicrhinosinusitis(CRS)symptoms,acommonsinusinflammatorydisease,within
threecrosssectionalquestionnairesaswellaswithintheinthechangesinsymptomsbetween
questionnaires.
Acrossvalidatedapproach(trace)wasdevelopedbywhichmincreasesuntilthe
discrepancybetweentheimpliedcorrelationofapartitionofdataandtheobservedcorrelation
oftheotherdatapartitionincreases.Inordertoassesstheperformanceofthisnewmethodas
wellasother,commonapproaches,asimulationstudywasdesignedinwhichvalidfactor
loadingmatricesweresimulatedusinganewprocedure,andrandomsamplesweredrawn
fromtheirrespectivecorrelationmatrices.Thetracemethoddisplayedquicklyincreasing
accuracywhenmoresamplesweredrawn,aphenomenonnotobservedinothermethods.
TracewasalsoappliedtotheCRSdata,suggesting13factorstobeextracted,morethanother
methods.Thisnon-agreementpossiblyhighlightsthedifferencesinfactorextraction
interpretations,andthedifferentmeaningsof“correct”m.
AnEFAwascarriedoutonself-reportedCRSsymptomsaswellaschangesinsymptom
responsesovertimeinordertoidentifyanyrelationshipsbetweenorcategorizationofCRS
iii
symptoms.Atotalof3535primarycarepatientswereincludedthisstudyhavingrespondedto
threequestionnairesof37repeatedquestionsspanninga16-monthperiod.Afterextracting
factorsfromallthreequestionnairesandtwosymptomdifferencescores,fivestablefactors
wereidentifiedineach.Thefactorsofcongestionanddischarge,facialpainandpressure,smell
loss,asthmaandconstitutionalaswellasearandeyesymptomswereconsistentwiththe
hypothesisthatCRSsymptomsaremeasuringseveraldistinctbiologicalprocesses.
Readers:KarenBandeen-Roche,BrianSchwartz
iv
Acknowledgments:IwouldfirstliketothankmythesisadvisorsandreadersDr.KarenBandeen-RocheandDr.BrianSchwartzoftheJohnsHopkinsBloombergSchoolofPublicHealthfromwhomIhavelearnedsomuch.WiththeirguidanceandsupportIwasallowedtoworkanddevelopideasonmyownwhilebeingsteeredbackontrackwhenneeded.Imustalsoexpressmygratitudetomyparentswhohavesupportedandencouragedmethroughmystudies.Withouttheirsupport,thisworkwouldnotbepossible.
v
TableofContents
Abstract:.................................................................................................................................ii
Acknowledgments:.................................................................................................................iv
Chapter1-Introduction.........................................................................................................1
Chapter2-ACross-ValidatedApproachtoExploratoryFactorAnalysisModelSelection........3Introduction.....................................................................................................................................3Background......................................................................................................................................7
NotationandAssumptions..................................................................................................................7EFAandPCA.........................................................................................................................................8ExistingEFAModelSelectionStrategies..............................................................................................8
NovelMethodforModelSelection.................................................................................................13SimulationStudy............................................................................................................................14Results...........................................................................................................................................16
ApplicationtotheCRSStudy.............................................................................................................17Discussion......................................................................................................................................19Futurework...................................................................................................................................23TablesandFigures..........................................................................................................................25Appendix........................................................................................................................................31
AdditionalFigures:.............................................................................................................................31SimulatingFactorModelCorrelationMatrices.................................................................................44
Chapter3-ExploratoryFactorAnalysisofCRSSymptoms.....................................................47Introduction...................................................................................................................................47Methods........................................................................................................................................49
Studypopulationanddesign.............................................................................................................49Datacollection...................................................................................................................................50Analyticvariables...............................................................................................................................51StatisticalAnalysis.............................................................................................................................51SensitivityAnalysisandDiagnostics..................................................................................................55
Results...........................................................................................................................................56Descriptionofstudysubjects............................................................................................................56Cross-sectionalEFAs..........................................................................................................................57LongitudinaldifferenceEFAs.............................................................................................................58FactorScores.....................................................................................................................................59
Discussion......................................................................................................................................60LimitationsandFurtherWork:..........................................................................................................66
Conclusion.....................................................................................................................................67Tables&Figures.............................................................................................................................69Appendix........................................................................................................................................80
Chapter4-Conclusion...........................................................................................................94
References............................................................................................................................97
vi
ListofTablesChapter2:TABLE1.NUMBEROFCORRECTFACTORNUMBERASSESSMENTSBYSAMPLESIZEADJUSTEDBIC(SSBIC),
STANDARDBIC(BIC),KAISEREIGENVALUESGREATERTHAN1RULE(K1),PARALLELANALYSIS(PA),ANDTHEPROPOSEDMETHOD(TRACE)OUTOF100SIMULATIONREPLICATES..............................................................25
TABLE2ESTIMATEDNUMBEROFFACTORSFOREACHOFTHE3QUESTIONNAIRES(BASELINE,6MONTH,AND16MONTHFOLLOWUPS)FROMCOMMONLYUTILIZEDMETHODSINCLUDINGKAISEREIGENVALUESGREATERTHAN1RULE(K1),PARALLELANALYSIS(PA),STANDARDBIC(BIC),EMPIRICALBIC(EBIC),SAMPLESIZEADJUSTEDBIC(SSBIC),ANDTHEPROPOSEDMETHOD(TRACE)........................................................................27
Chapter3:TABLE1.DEMOGRAPHICINFORMATIONOFTHE3535PATIENTSINCLUDEDINTHECURRENTANALYSISANDTHE
4312PATIENTSWHORETURNEDTHEBASELINEQUESTIONNAIREBUTWERENOTINCLUDEDINTHECURRENTANALYSIS............................................................................................................................................................69
TABLE2.QUESTIONSFORTHETHREECROSS-SECTIONALQUESTIONNAIRES.QUESTIONRESPONSESWEREONA5-ITEMLIKERTSCALE*..........................................................................................................................................70
TABLE3.FACTORLOADINGSANDSYMPTOMCOMMONALTIESFROMTHEEXPLORATORYFACTORANALYSIS(EFA)OFTHE37PRESENCE,SEVERITY,ANDSECONDARYCRSSYMPTOMATBASELINE.THEEFAWASFITUSINGORDINARYLEASTSQUARESANDANOBLIMINROTATION(NUMBEROFPATIENTS=3535).LOADINGSLESSTHAN0.3WEREOMITTEDFORREADABILITY.COMMUNALITIESREPRESENTTHEFRACTIONOFEACHSYMPTOM’SVARIABILITYTHATWASCAPTUREDBYTHEUTILIZEDFIVEFACTORMODEL................................72
TABLEA1.FACTORLOADINGSANDSYMPTOMCOMMONALTIESFROMTHEEXPLORATORYFACTORANALYSIS(EFA)OFTHE37PRESENCE,SEVERITY,ANDSECONDARYCRSSYMPTOMAT6MONTHFOLLOWUP.THEEFAWASFITUSINGORDINARYLEASTSQUARESANDANOBLIMINROTATION(NUMBEROFPATIENTS=3535).LOADINGSLESSTHAN0.3WEREOMITTEDFORREADABILITY.COMMUNALITIESREPRESENTTHEFRACTIONOFEACHSYMPTOM’SVARIABILITYTHATWASCAPTUREDBYTHEUTILIZEDFIVEFACTORMODEL.................86
TABLEA2.FACTORLOADINGSANDSYMPTOMCOMMONALTIESFROMTHEEXPLORATORYFACTORANALYSIS(EFA)OFTHE37PRESENCE,SEVERITY,ANDSECONDARYCRSSYMPTOMAT16MONTHFOLLOWUP.THEEFAWASFITUSINGORDINARYLEASTSQUARESANDANOBLIMINROTATION(NUMBEROFPATIENTS=3535).LOADINGSLESSTHAN0.3WEREOMITTEDFORREADABILITY.COMMUNALITIESREPRESENTTHEFRACTIONOFEACHSYMPTOM’SVARIABILITYTHATWASCAPTUREDBYTHEUTILIZEDFIVEFACTORMODEL.................88
TABLEA3.FACTORLOADINGSANDSYMPTOMCOMMONALTIESFROMTHEEXPLORATORYFACTORANALYSIS(EFA)OFTHE37PRESENCE,SEVERITY,ANDSECONDARYCRSSYMPTOMCHANGESFROMBASELINETO6MONTHS.EFAWASFITUSINGORDINARYLEASTSQUARESANDANOBLIMINROTATION(NUMBEROFPATIENTS=3535).LOADINGSLESSTHAN0.3WEREOMITTEDFORREADABILITY.COMMUNALITIESREPRESENTTHEFRACTIONOFEACHSYMPTOM’SVARIABILITYTHATWASCAPTUREDBYTHEUTILIZEDFIVEFACTORMODEL............................................................................................................................................................................90
TABLEA4.SYMPTOMCOMMONALTIESFROMTHEEXPLORATORYFACTORANALYSIS(EFA)OFTHE37PRESENCE,SEVERITY,ANDSECONDARYCRSSYMPTOMCHANGESFROMBASELINETO6MONTHSAND6MONTHSTO16MONTHS.EFAWASFITUSINGORDINARYLEASTSQUARESANDANOBLIMINROTATION(NUMBEROFPATIENTS=3535)..............................................................................................................................................92
vii
ListofFiguresChapter2:FIGURE1.STRONG“BLOCKED”FACTORLOADINGSUTILIZEDTOCREATETHECORRELATIONMATRIXINTHE
SIMULATIONSTUDY...........................................................................................................................................28FIGURE2.CORRELATIONMATRIXGENERATEDFROMTHESTRONG“BLOCKED”FACTORLOADINGSMATRIX.........29FIGURE3TRACEFUNCTION’SDISCREPANCYVALUESONTHEBASELINECRSDATA.VERTICALLINEDENOTESTHE
MINIMUMACHIEVEDAT13FACTORS...............................................................................................................30FIGUREA1.THEUTILIZEDSTRONGLOADINGMATRIX,CREATEDUSINGTHEDIRICHLETSIMULATIONPROCESS,
CONSISTINGOF5FACTORSAND25VARIABLES................................................................................................31FIGUREA2.THEUTILIZEDSTRONGCORRELATIONMATRIX,CREATEDFROMTHECORRESPONDINGSTRONG
LOADINGMATRIX..............................................................................................................................................32FIGUREA3.THEUTILIZEDMODERATELOADINGMATRIX,CREATEDUSINGTHEDIRICHLETSIMULATIONPROCESS,
CONSISTINGOF5FACTORSAND25VARIABLES................................................................................................33FIGUREA4.THEUTILIZEDMODERATECORRELATIONMATRIX,CREATEDFROMTHECORRESPONDINGMODERATE
LOADINGMATRIX..............................................................................................................................................34FIGUREA5.THEUTILIZEDWEAKLOADINGMATRIX,CREATEDUSINGTHEDIRICHLETSIMULATIONPROCESS,
CORRESPONDINGFROM5FACTORSAND25VARIABLES..................................................................................35FIGUREA6.THEUTILIZEDWEAKCORRELATIONMATRIX,CREATEDFROMTHECORRESPONDINGWEAKLOADING
MATRIX..............................................................................................................................................................36FIGUREA7.THEUTILIZEDMODERATE/LOWDIMENSIONALLOADINGMATRIX,CREATEDUSINGTHEDIRICHLET
SIMULATIONPROCESS,CONSISTINGOF5FACTORSAND11VARIABLES..........................................................37FIGUREA8.THEUTILIZEDMODERATE/LOWDIMENSIONALCORRELATIONMATRIX,CREATEDFROMTHE
CORRESPONDINGMODERATE/LOWDIMENSIONALLOADINGMATRIX............................................................38FIGUREA9.THEUTILIZEDMODERATE/DIFFERENTDIMENSIONLOADINGMATRIX,CREATEDUSINGTHEDIRICHLET
SIMULATIONPROCESS,CONSISTINGOF5FACTORSAND27VARIABLES..........................................................39FIGUREA10.THEUTILIZEDMODERATE/DIFFERENTDIMENSIONALCORRELATIONMATRIX,CREATEDFROMTHE
CORRESPONDINGMODERATE/DIFFERENTDIMENSIONALLOADINGMATRIX..................................................40FIGUREA11.THEUTILIZEDTENFACTORLOADINGMATRIX,CREATEDUSINGTHEDIRICHLETSIMULATIONPROCESS,
CONSISTINGOF10FACTORSAND100VARIABLES............................................................................................41FIGUREA12.THEUTILIZEDTENFACTORCORRELATIONMATRIX,CREATEDFROMTHECORRESPONDINGTEN
FACTORLOADINGMATRIX................................................................................................................................42FIGUREA13.TRACEFUNCTION’SDISCREPANCYVALUESONTHESTRONG“BLOCKED”FACTORLOADINGMATRIX.
VERTICALLINEDENOTESTHEMINIMUMACHIEVEDAT5FACTORS.................................................................43Chapter3:FIGURE1.LASAGNAPLOTDISPLAYINGTHEPROPORTIONOFINDIVIDUALSWITHEACHGIVENRESPONSETOTHE
QUESTION“ONAVERAGE,HOWOFTENINTHEPAST3MONTHSHAVEYOUHADPOST-NASALDRIP?”ATBASELINEAND6MONTHSAND16MONTHSLATER(1=NEVER,2=ONCEINAWHILE,3=SOMEOFTHETIME,4=MOSTOFTHETIME,5=ALLTHETIME).Y-AXISVALUESINDICATETHENUMBEROFPATIENTSWITHEACHPARTICULARRESPONSEATBASELINE......................................................................................................76
FIGURE2.INTER-FACTORCORRELATIONSATBASELINEFROMTHEBASELINEQUESTIONNAIREEXPLORATORYFACTORANALYSISFITVIAORDINARYLEASTSQUARESANDANOBLIMINROTATION......................................77
FIGURE3.FACTOR1(CONGESTIONANDDISCHARGE)SCORESBYCRSEPOSGROUPSATBASELINE.FACTORSCORESACROSSFACTORSANDCRSEPOSSGROUPS(CURRENTCRS,PREVIOUSCRS,NEVERCRS)FORTHE
viii
CONGESTIONANDDISCHARGEFACTORATBASELINEWITHNUMBEROFINDIVIDUALS(N)INEACHGROUP.FACTORSCORESWEREESTIMATEDBYTHEITEMRESPONSETHEORY(IRT)BASEDSCORESMETHOD.X-AXISWASJITTEREDTOIMPROVEREADABILITY........................................................................................................78
FIGURE4.CONTINUOUSFACTORSCORESCATEGORIZEDTOSHOWLONGITUDINALCHANGEACROSSQUESTIONNAIRESFORFACTOR1(CONGESTIONANDDISCHARGE).FACTORSCORESWERECATEGORIZEDAS:FACTORSCORE<-1WEREASSIGNEDVALUESOF-2;BETWEEN-0.5AND-1,ASSIGNED-1;BETWEEN-0.5AND0.5,ASSIGNED0;BETWEEN0.5AND1,ASSIGNED1;AND>1,ASSIGNED2.Y-AXISLABELSINDICATETHENUMBEROFPATIENTSATBASELINEINEACHADJUSTEDFACTORSCOREGROUP.FACTORSCORESWEREESTIMATEDBYTHEIRTMETHOD......................................................................................................................79
FIGUREA1.SCREEPLOTFORTHEBASELINEQUESTIONNAIREDISPLAYINGEIGENVALUESONTHEY-AXISANDTHEIRCORRESPONDINGFACTORNUMBERONTHEX-AXIS........................................................................................81
FIGUREA2.SCREEPLOTFORTHE6MONTHFOLLOWUPQUESTIONNAIREDISPLAYINGEIGENVALUESONTHEY-AXISANDTHEIRCORRESPONDINGFACTORNUMBERONTHEX-AXIS..............................................................82
FIGUREA3.SCREEPLOTFORTHE16MONTHFOLLOWUPQUESTIONNAIREDISPLAYINGEIGENVALUESONTHEY-AXISANDTHEIRCORRESPONDINGFACTORNUMBERONTHEX-AXIS..............................................................83
FIGUREA4.SCREEPLOTFORTHEFIRSTDIFFERENCE(BASELINETO6MONTHQUESTIONNAIRES)DISPLAYINGEIGENVALUESONTHEY-AXISANDTHEIRCORRESPONDINGFACTORNUMBERONTHEX-AXIS......................84
FIGUREA5.SCREEPLOTFORTHESECONDDIFFERENCE(6TO16MONTHQUESTIONNAIRES)DISPLAYINGEIGENVALUESONTHEY-AXISANDTHEIRCORRESPONDINGFACTORNUMBERONTHEX-AXIS......................85
1
Chapter1-Introduction
Exploratoryfactoranalysis(EFA)isastatisticalmethodutilizedtoinvestigateand
summarizethejointdistributionofacollectionofvariablesthroughtheestimationofthe
relationshipbetweentheseobservedvariablesandunobservedbuttheorizedfactors.Itrelies
ontheassumptionthatcovarianceamongmeasuredvariablesarisesfromasmallersetof
latentfactorswhichareassociated,tovaryingdegrees,toeachobservablevariable(knownas
thecommonfactormodelassumption).Thesemethodsarecommonlyutilizedwhenthereis
littletonoaprioriknowledgeaboutthelatentstructureassociatedwithvariables,andhave
beenemployedacrossavarietyofscientificdomains.Commonly,EFAsareemployedinan
attempttobetterunderstandrelatedphenomena,toallowresearcherstocreatescales,andas
anintuitivewaytostudywhatacollectionofobservationsismeasuring.
Despitebeingalong-establishedtechnique,considerabledifficultystillisencountered
whenemployingthisapproach.Themostcommonissuepractitionersfacewhileattemptingto
utilizeanEFAiswhichfactormodeltoutilize—largely,howmanyfactorstoincorporate.
Determiningthenumberoffactorstoinclude(𝑚)canbeadifficultproblemasasubtlechange
in𝑚canvastlyimpacttheresultsandinterpretationsoftheanalysis,andthechoiceof𝑚itself
canbeveryambiguous.Furthermore,theproblemofselecting𝑚hasbeenapproachedfroma
varietyofangles,noneofwhichhasearnedconsensusagreementasagoldstandardmethod.
Partofthisthesisincludesworkstudyingexistingmethodsforchoosingthenumberoffactors
(𝑚),whileproposingandexamininganewmethodwhichincorporatesacrossvalidated
approachtoselecting𝑚.
2
Inaddition,weappliedtheprinciplesstudiedtoananalysisofchronicrhinosinusitis
(CRS)symptomdatacollectedfrompatientsidentifiedinalarge,integratedhealthsystem.A
totalof3535patientsrespondedto37questionspertainingtoaspectrumofCRSsymptoms
threetimes(baseline,6-monthand16-monthfollow-upquestionnaires).TheCRSstudy,infact,
motivatedthestatisticalwork.EFAwasconductedtobetterunderstandrelationshipsamong
andcategorizationofCRSandCRSrelatedsymptoms.EFAswereconductedforeachofthe
threequestionnairesaswellasforthechangeinreportedsymptomfrequency(baselineto6
months,and6monthsto16months).Wewereabletoidentifyfivesimilarfactorsineachof
theseanalyses,eachwithabiologicallyplausiblepathologicalexplanation,suggestingthat
theremayberealphenomenadrivingtheseobservations.Atthesametime,theselectionof
fiveasthefactornumberwasbetterevidencedinsomeperiodsthanothers,andbysome
methodsthanothers,illustratingthechallengesstudiedinourfirstpaper.
Thisthesiscomprisestwopapers,onethatutilizedEFAinordertoexplorethe
covariancestructureofself-reportedsymptomsrelatedtoCRSandcommon,co-morbid
conditions,whiletheotherstudiedthevariousmethodsofdeterminingthenumberoffactors
inEFAsettingsandproposedanothermethodtodoso.Itbeginswiththemethodologicalwork
andthenproceedstothedetailedCRSanalysis.Thesepapersworkintandembyexaminingthe
theoryandchallengesfromananalyticandphilosophicalstandpointofselectingthe“optimal”
numberoffactors,whileestimatingthenumberoffactorsandputtingEFAtoworkinareal-
worldsettinginvolvingacommondisease.Aconcludingchapterprovidessynthesisand
identifiesareasforfuturework.
3
Chapter2-ACross-ValidatedApproachtoExploratoryFactorAnalysisModelSelection
Introduction
Frequentlyitisofpublichealthinteresttocharacterizeattributesofdatathatcannotbe
measureddirectly.Instead,agroupofobservablevariablesthatindirectlycharacterizethe
unobservedarecollected.Insuchsettings,itisoftenofimportancetoexploretheunderlying,
“latent”structureofdatainadditiontotheobservedmanifestvariables.Ideally,indoingsowe
canstudyunobserved,latentvariableswhichmaybemoreinterpretableorofagreater
importancethantheirmeasuredcounterparts.Onemethodofestimatinglatentstructureis
throughfactoranalysis(FA).Thismethodiscommonindiversefieldsrangingfrompsychology
andeconomicstohealthandspirituality(Fabrigar,Wegener,MacCallum,&Strahan,1999;
Hirose,Kawano,Konishi,&Ichikawa,2011;Underwood&Teresi,2002).
Therearetwogeneralcasesoffactoranalysis,exploratoryandconfirmatory.
Exploratoryfactoranalysis(EFA)aimstoidentifytheunderlyingstructureofvariableswithlittle
aprioriknowledgeofanysuchrelationship,whileconfirmatoryfactoranalysisisutilizedtotest
whetheraproposedlatentstructureadequatelyfitstheobserveddata.Whilebothareuseful
andpowerfultechniques,exploratoryfactoranalysisrequirestheadditionalstepofchoosing
𝑚,theestimatednumberoffactors(𝑚)characterizingtheobserveddatadistribution,amodel
selectionproblemwhichwillbeconsideredbelow.Thismodelselectionisimportantasboth
4
thequantitativeandqualitativeresultsofanEFAmayrelyheavilyonthisselection,suchthat
differentspecificationsof𝑚maypotentiallyleadtoalteredinterpretationsandinferences.
Currentmethodsofdetermining𝑚varywithrespecttocomputationaswellastheory,some
utilizinglikelihoodbasedmethods,includingAkaikeInformationCriterion(AIC)andBayesian
InformationCriterion(BIC),toassessthehypothesisthatthedataisgeneratedfrommodels
withspecific𝑚,whileothersutilizepropertiesofcorrelationmatrices(intermsofeigenvalues),
orothercriteriatotestmodelfit.Somemethodsofestimating𝑚areuseful,butmosthave
drawbacksaswell.
TheFAapproachisbasedonthecommonfactormodelbywhichobservedvariablesare
conceptualizedtoarisefrom3components:commonfactors,uniquevariability,and
measurementerrors/noise(Brown,2014).Thecoreequationofthecommonfactormodelis
asfollowsinscalarnotation:
𝑦#,% = 𝜆%,(𝜈#,(
*
(+,
+ 𝜖#,%
where𝑦#,% isthevalueofvariable𝑗forperson𝑖,𝜈#,( isthe𝑔thfactorvariableforperson𝑖,𝜆%,(is
the“loading”ofthe𝑗thvariableontothe𝑔thfactor,and𝜖#,% istheresidualtermforperson𝑖and
variablejwhichremainsunexplainedbythefactormodelspecifictoour𝑗thvariable,thesum
ofuniquenaturalvariationandmeasurementerror(Brown,2014).Withoutrepeated
observations,thenoisetermisnotdistinguishablefromthenaturalvariationterm,andinthe
aboveequation,bothcomprise𝜖#,%.Thefactorsimpact,tovaryingdegreesasreflectedbytheir
loadings,manyobservedvariables.
5
Itwasouraimtointroduceacrossvalidatedapproachbywhich𝑚waschosentobethe
valuewhichreducedthedifferencebetweendistributionsimpliedbyanestimatedfactormodel
andempiricallycharacterizingindependentlyheldoutdata.
Factoranalysismodelselectionisdifficultandcommonlyreliesonqualitativeorbiased
methods.Thereisaneedtoexplorealternativemethodsofidentifying𝑚whichdisplay
desirableproperties,suchascloseproximitytoanunderlyingdatadistributionor
reproducibility.Wealsoperceiveneedtoevaluatemethodsinsettingsinwhichthenumberof
observedvariablesislargerelativetothesamplesize.Aselaboratedshortly:Currentmethods
canbeinaccurateacrosssomeoralltestingattributes(e.g.correlationstrength,factor
structure,andsamplesize).Theproposedmethodutilizesacross-validatedapproachto
determinewhentheadditionofafactordoesnotsummarizeadditionalcommonvariabilityin
theEFAmodel.Inmorepreciseterms,theproposedmethodcontinuallyincreasesthenumber
offactors(increasing𝑚)untilthedifferencebetweentheobservedandproposedcorrelation
matrix,asmeasuredbyadiscrepancyfunction,increases.Cross-validationhelpsusavoidthe
followingpitfall:Asthenumberoffactors(𝑚)increasestowardsthenumberofvariables,𝑝,in
asinglesample,the'difference'betweentheobservedandimpliedcorrelationmatrixwill
decrease-evenifonlyduetonoise.Byincorporatingacrossvalidatedapproach,weexpect
thattheadditionofafactorthatonlycharacterizesnoiseinonepartitionshouldactually
worsenfitinanother.Whenthisphenomenonisobserved,weproposethatthepreviously
utilizednumberoffactorsisabetterfitforthedataathand.
6
Thisstudywasmotivatedbyaprojectinvestigatingchronicrhinosinusitis(CRS),asinus
inflammatorydiseaseimpactingapproximately15%oftheUnitedStatesadultpopulation(Tan,
Kern,Schleimer,&Schwartz,2013).CRSiscommonlydefinedbythepresenceof4cardinal
symptomsassociatedwithsinusswellingpresidingforanextendedperiodoftime,butits
diagnosistypicallyalsorequiresobjectiveevidenceofinflammationsuchasbycomputerized
tomography(CT)scanning.Onebarriertotheeffectivediagnosisandtreatmentofthedisease
isthattheconnectionbetweenobjectiveinflammationandpatientsymptomsofsinus
opacificationisnotwellunderstood(Wjetal.,2012).Additionally,obtainingobjectiveevidence
canbedifficultinresource-limitedorhigh-volumesettings,makinganimprovedsymptom-
basedmethodofdiagnosisdesirable.Tobeginaddressingthesebarriers,themotivatingstudy
assessedalargesampleofpatientsfromalarge,integratedhealthsystemforpresenceand
severityofalargenumberofsinus-relatedsymptoms.EFAwasproposedasamethodto
summarizesymptomclusteringandhencefacilitatethesubsequentstudyofsymptom
relationshipswithobjectiveevidenceofCRS(Cole,Schwartz,&Bandeen-Roche,2017).
Intheremainderofourpaper,weexamineexistingmethodsforestimatingthenumber
offactorsinEFAsettingsasabackgroundforthisstudy,discussingsomepreviouslyestablished
strengthsandweaknessesofeach.Thenweintroducetheproposedmethodandprovidea
simulationstudycomparingefficacyofmethodsacrosspotentialcircumstancesincluding
samplesizes,correlationstrengthsanddistributions,aswellasnumberofvariables.Resultsare
comparedacrossmethods,correlationstructures,andsamplesizes.Finally,weapplythe
variousmethodsforselectingthenumberoffactorstotheCRSstudy.TheCRSfindings
themselvesarepresentedinthenextthesispaper.
7
Background
NotationandAssumptions
Exploratoryfactoranalysisaimstorepresentamultivariatedatadistribution(commonly
throughthecorrelationmatrix,𝑅)accordingtothecommonfactormodel.Thismodelis
characterizedbyparameterswecollectivelylabelas“𝜃”,consistingofa𝑝×𝑚loadingmatrix,𝛬,
𝑝×𝑝inter-factorcorrelationmatrix𝛹,and𝑝×𝑝uniquevariance(diagonal)matrix,𝛥:.Because
each𝜃“implies”exactlyonecorrelationmatrix,𝑃 𝜃 ,whichisthemodel’scharacterizationof
thematrixofcorrelationsamongtheobservedvariables,wecanmakestatementsaboutan
EFA'simpliedcorrelationmatrixwhichhastheform𝑃(𝜃) = 𝛬𝛹𝛬′ + 𝛥?.
Thecommonfactormodelrepresentsthemeasuredvariablesasfunctionsoflatent
(unobserved)factorsaswellasmodelparameters,mostnotablyfactorloadings.Inmatrix
notation(scalarrepresentationhasbeenpreviouslyprovided)𝒚𝒊 = 𝛬𝜁 + 𝛿where𝒚𝒊isthe
(𝑝×1)vectorofobservedvariablesforperson𝑖,𝛬is,again,the(𝑝×𝑚)loadingmatrix,𝜁isthe
(𝑚×1)latentfactorvector,and𝛿isthe(𝑝×1)vectoroferrortermsforeachindividual.Each
element𝛿isassumedtobeindependentfromeachotherandindependentof𝜁aswell.
Commonassumptionsofthismodelarethattheerrortermsaremutuallyindependent
andindependentofthefactorvariables,andthatthecollectionofthefactoranderrorterm
variablesismultivariatenormallydistributed.Thevariancesofvariable-specificresidualterms,
Var(𝜖%),arereferredtoas"uniqueness"termsthatremainunexplainedbythecommonfactors,
leavingVar(𝑌%)-Var(𝜖%)asthe“common”orsharedvariance(“commonality”)inagiven
observedvariablethatisattributabletoitsfactorcontributions.
8
Fortheremainderofthispaper,𝑁willdenotethesamplesize.
EFAandPCA
ItmaybeworthmakingaquicknoteofthedifferencebetweenEFAandprincipal
componentsanalysis(PCA),tworelatedandcommonlyconfused,yetdistinct,techniques.The
aimofPCAistoreducethedimensionalityofdatawhileretainingasmuchinformation
(variability)aspossiblethroughtheconversionofcorrelatedvariablesintoasetofuncorrelated
linearcombinationsof(oftenstandardized)observedvariablescalledprincipalcomponents.
EFAisamodel-basedanalysisthataimstoidentifytherelationshipbetweenhypothesized
latent,butpotentiallyrelatedfactors,andtheobservedvariables.Whilebothofthesemethods
aredimensionreductiontechniques,eachisusedtoanswerverydifferentquestions,andare
notinterchangeable.
ExistingEFAModelSelectionStrategies
TherearemanycommonapproachestoEFAmodelselection,eachselectingan𝑚based
onsomecriteriathattrytoidentifyan'optimal'numberoffactors,𝑚.Ifanappropriatechoice
for𝑚exists,choosing𝑚 > 𝑚iscalledoverfactoring.Overfactoringmayresultina
misunderstandingofreallatentconstructspresentindataastruefactorsmaybesuperfluously
splitintoseveralfactorsorfactorswhichhave'random'lowloadingvariablesmayappear
(Norris&Lecavalier,2010).Ontheotherhand,choosing𝑚 < 𝑚iscalledunderfactoring.
Underfactoringisconsideredtobeamoreseriousconcern,asobservedfactorswillload
9
erroneouslyontofactorstowhichtheydon'tbelong,providingmisleadingevidenceforfactor
identity(Norris&Lecavalier,2010).
Severalstrategiesforestimating𝑚relyheavilyonassessingandinterpretingthe
eigenvaluesofacovarianceorcorrelationmatrix-eitheroftheobservedvariables,orthatis
estimatedtoarisefromthecommonfactorportionofthefactormodel.Fortheremainderof
thispaperwewillassumethatanalysesarebasedoncorrelationmatrices:thishasthe
advantageofstandardizingallvariancestoequaloneandallcovariationmeasurestolieona-1,
1range.Ineithercase,theeigenvalueofafactorisrepresentativeoftheamountofvariability
thefactorcontributestothesumofvariablevariancesinmostfactoringmethods(Norris&
Lecavalier,2010).Computationally,afactor'seigenvaluewithrespecttotheobservedvariable
correlationmatrix(whenfactorsareorthogonal)isitssumofsquaredloadings(Norris&
Lecavalier,2010).
Therearegraphicalmethodsthatemploycorrelationmatrixeigenvaluesaswell,suchas
Cattell’sscreetest(Cattell,1966).Thistestconsistsofexaminingaplotofeigenvaluesversus
factorindex(1,2,3,…)tovisuallyselectthenumberoffactorstoextract.Ideally,therewillbe
aninitialsteepdropineigenvaluesfollowedbyaclearlevelingoutinthetrendofremaining
decrease,creatingaclassic'elbow'shape(Cattell,1966).Thenumberofeigenvaluesbeforethe
elbowistheproposednumberoffactors.Whileintuitive,aclearproblemwiththistechniqueis
thatthereisnotalwaysaclear‘elbowshape’,leavingpotentialforvaryinginterpretationby
differentinvestigators.
10
OneofthemostpopularmethodsforestimatingthenumberoffactorsistheKaisertest
(K1),whichisbasedon𝑒,, 𝑒J, . . . 𝑒L—theeigenvaluesoftheobservedvariablecorrelation
matrix.K1choosesthenumberoffactorstobeequaltothenumberofeigenvaluesgreater
than1,𝑚 = 𝛴,L𝐼(𝑒# > 1)(Cattell,1966;Kaiser,1960).Thismethodprovidesanintuitive
understandingoffactorretentionmethodsbywhichoneretainsthefactorsaccountingfor
morethanasinglestandardizedvariableworthofvariability(Velicer,Eaton,&Fava,2000).It
tendstooverestimatethenumberofcomponentsinPCAsettings,however,potentiallydueto
randomnoisepushing'borderline'eigenvaluesoverthe'threshold'of1(Veliceretal.,2000;
Zwick&Vejicer,1984).
ApopularextensionoftheK1testisparallelanalysis,bywhichobservedeigenvaluesare
compared—typically,plotted—againstameasureofcentraltendencyforeigenvaluesas
simulatedfrommanyrandomlygeneratedindependent(noise)correlationstructures
(Humphreys&Jr,1975;Timmerman&Lorenzo-Seva,2011).Themeasureofcentraltendency
maybemean,median,possiblysomeotherpercentile,orevenasinglerealizationofsimulated
eigenvalues(Humphreys&Jr,1975).Thefactorsselectedisthenumberofobserved
eigenvaluesthataregreaterthanthesimulatedeigenvalues(Horn,1965).Parallelanalysishas
beenconsideredtobeoneofthemostpowerfulandaccuratemethodsofdeterminingthe
numberoffactorstoextract,andhasbeenshowntodisplaybetterperformancethantheK1,
scree,andothermethodswhilebeingrelativelyeasytounderstand(Veliceretal.,2000;Zwick
&Vejicer,1984).Parallelanalysisissensitivetosamplesizehowever,withincreased𝑁
commonlyresultinginmorefactorsbeingretained,potentiallyoveranoptimalamount(Velicer
etal.,2000).
11
Othermethodsdonotexplicitlyconsidereigenvaluesandinsteadtakeamore
traditionalmodelselectionapproach.Suchmethodsincludeevaluatinglikelihoodsand
functionsoflikelihoods(Preacher,Zhang,Kim,&Mels,2013).Thelikelihoodutilizedinfactor
analysis,oncelogarithmicallytransformed,isgivenas
𝑙𝑜𝑔(ℒ) = − ,J𝑁(𝑙𝑜𝑔(|𝑃 𝜃 |) + 𝑡𝑟(𝑃 𝜃 V,𝑅))wheneachofthe𝑁observationsaretakento
benormallydistributedandindependentofoneanother(Akaike,1987).Likelihoodratiotests
canbecreatedwhichassessthenullhypothesisthattheobserveddataaregeneratedfrom
factormodelwithaspecific𝑚(Preacheretal.,2013).Typically,onecontinuestoincrease𝑚
untilthefactormodelfitsthedata(lowest𝑚forwhichthenullisnotrejectedbythelikelihood
ratiotest).Unfortunately,thismethodcomeswithseveraldrawbacks.Largesamplesizescause
even'small'discrepanciesbetweenmodelandobserveddatatocausearejectionwhileinsmall
samplesizesituations,largediscrepanciesmaynotbeidentified,leavingperformancetobe
determinedlargelybythegivensamplesize(Norris&Lecavalier,2010).
Inadditiontolikelihoodratios,variousextensionsincludingtheAICandBICaswellas
BICderivativessuchassamplesizeadjustedBIC(SSBIC)andempiricalBIC(EBIC)havebeen
frequentlyutilizedinafactoranalysismodelselectionframework(Hiroseetal.,2011;Lopes&
West,2004;Press&Shigemasu,1999).
Forthekorthogonal-factormodel,𝐴𝐼𝐶(𝑚) = −2×𝑙𝑜𝑔(ℒ(𝑚)) + [2𝑝(𝑚 + 1) − 𝑚(𝑚 −
1)](Akaike,1987),𝐵𝐼𝐶(𝑚) = −2×𝑙𝑜𝑔(ℒ(𝑚)) + log(𝑛)[𝑝(𝑚 + 1) − 0.5𝑚(𝑚 − 1)](Lopes&
West,2004),and𝑆𝑆𝐵𝐼𝐶 = −2×𝑙𝑜𝑔 ℒ 𝑚 + 𝑚×log(𝑛 + 2 24)(Sclove,1987),where𝑚is
thenumberoffactors,𝑝isthenumberofobservedvariablesand𝑛isthenumberof
12
observationsutilized(Akaike,1973).Althoughcloselyrelated,thesemethodscanproducevery
differentestimatesof𝑚(Hiroseetal.,2011).Thisisnotsurprising,becauseAICisdirected
towardoptimizingprediction,whereasBICwasdesignedtoidentify“true”modelcomplexity
(Akaike,1973;Schwarz,1978).Inaddition,otherextensionsofAICandBIChavebeen
produced,andappliedinthecontextoffactoranalysisincludingmethodssuchasgeneralized
BIC(Hiroseetal.,2011).
Bootstrappingmethodshavebeenproposedtoprovideanalternativemethodof
determiningthenumberoffactorswhileproducingameasureofuncertainty(Thompson,
1988).Somesuchproceduresdrawbootstrapsamplesandthenestimate𝑚foreachsample
usingacommonapproachsuchasparallelanalysis:bydoingthis,onecouldobtainabootstrap
intervalfor𝑚,whichcouldinformanappropriaterangeofvaluesfor𝑚.Insimilarfashion
bootstrapintervalsforcommonality,loadingsandinter-factorcorrelationmeasuresalsocould
beprovided.Other,recentmethodologicalworkhasshowntheefficacyofcrossvalidated
methodologiesaswell.A“bi-cross-validation”techniqueproposedbyOwen&Wang(2016)
randomlyholdsoutsubmatricesofthedatamatrix,againstwhichfactormodelpredictions
developedontheremainingcomponentsofthedatamatrixaretested.Thismethodhasbeen
showntooutperformavarietyofothermethods,evenparallelanalysis,undercertain
simulatedsituations(Owen&Wang,2016).Thismethodevaluatespredictionswithrespecttoa
submatrixoftheobserveddatamatrix,incontrasttothemethodweshortlypropose,which
evaluatespredictionswithrespecttoacorrelationmatrixbasedonasubsetofthesampled
observations.Italsousesadistinct,multi-stepproceduretodeveloppredictionsandhasa
13
distinctgoalofrecoveringtheunderlyingfactorprediction,ratherthana“true”numberof
factors,m(Owen&Wang,2016).
NovelMethodforModelSelection
Intheordinaryleastsquaresmethodforestimatingthefactoranalysismodel,the
discrepancybetweenandobservedandfactor-impliedcorrelationmatricescanbemeasuredby
thefollowingdiscrepancyfunction(Lee,Zhang,&Edwards,2012):
𝑓 =12TRACE( 𝑅 − 𝑃 𝜃 )J ,
wherethetracefunctionisthesumofdiagonalmatrixentries.Wecanassessfactormodelfit
bytrackingthevalueof𝑓atvaryingnumbersoffactors.Iffittingandassessmentareconducted
withinone,samedataset,weexpectthat𝑓willimprove(decrease)as𝑚,thenumberof
factors,increases.Whencross-validationisapplied,however,withfittingconductedinone
subsetandtestingappliedinanother,weexpectthat𝑓willimproveas𝑚increasestoapoint,
butthenworsenasoneexceedsthedimensionneededtocharacterizethetruedata
distribution.Thespecificprocedureweproposeisasfollows:
Procedure:
1. Randomlysplitdataintoatrainingandtestingset(ℎ = 2)
2. Calculatecorrelationmatrixfortrainingandtestingset
14
3. Fitfactormodels(using𝑚from2to𝑝)usingthetrainingmatrix
4. Computethevalueoftracefunctioncomparingtheimpliedcorrelationmatrixfromthe
trainingdatatoobserveddatatestcorrelationmatrix.
5. Findwhere𝑓increases,stopthere.Ourchoice(𝑚)isthelast'step'before𝑓increases.If𝑓
increasesindefinitely,ourbestestimateof𝑚willbe𝑝,indicatingthatafactormodelis
notaparsimoniousfitforthedataathand.
SimulationStudy
Weassessedtheeffectivenessofourproposedmethodrelativetocommonexisting
modelselectionmethodsinasimulationstudy.Randomsamplesfromfactormodelswith
known𝑚weregeneratedandtheproportionofsamplesforwhich𝑚 = 𝑚wasestimatedfora
varietyofcorrelationstructuresarisingfromdifferentfactormodels.Factormodelswerefit
usingtheordinaryleastsquares(OLS)methodasimplementedbythepsychRpackage(Leeet
al.,2012;RCoreTeam,2016;Revelle,2017).Itwasouraimtocomparefindingsover
correlationstructuresvaryinginseveraldifferentaspectsincluding:strengthsofloadings,
numberofobservedvariables,anddistributionofloadingsamongfactors.Theprocedure
outlinedbelowwasutilizedinordertoprovideloadings/correlationmatriceswithboth
structureandelementsofrandomnessinthehopesofapproximatingreal-worldsituations.
Ninefactorstructureswereutilized,sixfromasimulatedtheoreticalframeworkand
threeincorporatingtheimpliedcorrelationmatricesfromtheCRSEFAwhichmotivatedthis
15
study.EFAstructuresforfive-factormodels(representingassessmentsatbaseline,6month
followup,16monthfollowup),wereutilizedinordertoprovidestructuretothissimulation
studywhichapproximatedacomplexempiricalscenario,whilebeingdeterminedinadvance
andthusfeasiblycapableofmodelselectionmethodaccuracy.Outofthesixtheoretical
matrices,three“blocked”matrices(namedweak,moderate,andstrong)contained25variables
and5factors,witheachfactorhavingexactly5variablesloadingontoitweakly,moderately,or
strongly(meanloadings=0.38,0.56,and0.71respectively)andremainingloadingsonly
minimally(meanloadings=0.15,0.11,and0.07respectively).A“moderate,lowdimensional”
matrixrepresentedamodelincluding5factorsand11observedvariables,with3variables
loadingmoderately(meanloadings=0.56)ontothefirstfactorwhiletheremainingfactors
contained2uniquevariablesloadingmoderately;minimalloadingswere0.11onaverage.A
“moderate,differentdimensional”matrixcontained5factorsand27variables,withfactors
loadingon10,7,5,3,and2variablesrespectivelywithmeanloadingof0.56whileminimal
loadingswere0.11onaverage.A10-factor,100variablematrixwasalsoutilizedwith10
variablesloadingheavilyontoeachfactor(meanloadings=0.037)and90variablesloading
minimally(meanloadings=0.007).
Theoreticalmatricesdescribedaboveweregeneratedasfollows.Bytreatingeachof𝑝
rowsoftheloadingmatrix𝛬asan𝑚dimensionalDirichletvectorwithparameters𝛼#,,, . . . 𝛼#,*
wecangenerateavalidloadingmatrixforwhichthesumofsquaresforeachrowislessthanor
equalto1(avoidingaHeywoodcase),andeachfactormatrixentryisbetween-1and1.Wecan
thencomputethecommonalitiesas𝛥: = 𝐼 − 𝑑𝑖𝑎𝑔(𝛬𝛬′).Thematrix:𝛬𝛬′ + 𝛥: representsour
simulatedcorrelationmatrixgeneratedfromaknownfactorstructure:Figures1and2illustrate
16
theresultforthestrong“blocked”correlationdesign,andothersareillustratedinthe
appendix.Thisfactorstructureiscompletelydeterminedbythechoiceof𝑝,𝑚,andallDirichlet
parametervalues.Althoughthisapproachissimple,itiscapableofgeneratingawiderangeof
structures(seeappendix).
Foreachcorrelationmatrix,simulationrunswereconductedforsamplesizesofN=100,
300,500,700,1000.Ineachrun,samplesoftherespectivesize(N)weredrawnfroma
multivariatenormaldistributionwithmeanzeroandtherespectivecorrelationmatrixasits
covariancematrix.Usingthesesimulatedsamples,K1,parallelanalysis,BIC,samplesize
adjustedBIC,andtheproposedmethodwereappliedtodeterminethenumberoffactors.For
eachrun,100repetitionswereconducted,andthenumberofsuccessfulestimates
(alternatively,thepercentofcorrectestimations)wererecorded.
Results
Theproposedtracefunctionmethodperformedincreasinglywellwithincreasing
samplesizes.Insomecases(e.g.'moderate'blockedstructure)accuracyincreasedfrom
approximately5%at𝑁 = 100to100%at𝑁 = 1000,whilethedifferenceinaccuracyofthe
other,standardmethodsvariedlessacross𝑁(Table1).Weakerstructuresweremoredifficult
fortheproposedmethodtoidentify.Forthe'moderate,lowdim'matrix,accuracyvariedfrom
5to35%asoneincreasedthesamplesizefrom100to1000(Table1).
Whilethetracefunctionimproveditsestimationfrom100samplesto1000foreach
proposedcorrelationmatrix,theK1methodaswellastheSSBICmethoddidnotonceimprove
17
fromthesamechangeinN(Table1).MeanwhiletheBICmethodimprovedonlyonceinthe9
proposedscenarioswhiletheparallelanalysisimprovedintwoofthe9(Table1).Overall,all
testedmethodsperformedverywellwithstrongercorrelationstructures,althoughBIC
performedcomparativelyworseintheCRSand10-factorsettings(Table1).
Therewerecaseswherestandardmethodsperformedoutstandinglywell.Inthestrong
correlationsimulations,theSSBIC,K1,andparallelanalysisapproachesperformedat100%
accuracyacrossN.SimulationsinvolvingthethreeCRScorrelationstructuresandthetenfactor
structuresawperfectaccuracyacrossNwithSSBICandK1methodswhilethetracemethod
achievedatleast97%accuracyatNof1000(Table1).Inthemoderatelowdimensionaland
moderatedifferentdimensionalmatricesatlowN,allmethodsperformedpoorly,withthe
tracefunction(N=100,accuracy=5%),andK1methodsachievingthebestresults(N=100,
accuracy=16%)ineachrespectivescenario(Table1).
ApplicationtotheCRSStudy
Ouranalysisaddresses3535GeisingerHealthSystempatientswhowerefollowedfora
durationof16months,eachofwhomwasselectedusingastratifiedsamplingmethoddesigned
tooversampleracialminoritiesandthosewithahighpropensityforCRSviaInternational
ClassificationofDiseases(ICD-9)andCurrentProceduralTerminologydefinedattributesin
electronichealthrecorddata(Hirschetal.,2017;Tustinetal.,2017).Eachofthesepatients
respondedtothreequestionnairescontaining37commonquestionsatbaseline,6months,and
16months.Ofthese37questions,21inquiredaboutthepresenceandseverityofCRSnasal
andsinuswhiletheremainingquestionsassessedpresenceofasthma,allergy,earand
18
constitutionalsymptoms.Allofthequestionsinquiredaboutthefrequencyofexperiencinga
symptom,orthefrequencyofbeingbotheredbyasymptominagiventimespanandeach
questionwasansweredonthesameLikertscale(1=never,2=onceinawhile,3=someofthe
time,4=mostofthetime,or5=allthetime).Thesequestionswerespecificallydesignedto
predictsinusopacificationlocationandseverityusingonlyself-reportedsymptoms.Polychoric
correlationswerecalculatedfromeachofthesesurveys,andeachofthemethodsstudiedin
oursimulationstudywasappliedtodeterminethenumberoffactorstoextract.
K1andparallelanalysisindicated6,5,7,and5,5,6factorsforbaseline,6monthand16
monthquestionnairesrespectivelywhileBIC(15,16,14),SSBIC(17,17,20),andtrace(13,13,
16)suggestedsubstantiallyhighernumbersoffactorsforthesamethreequestionnaires.Scree
plotswerealsoexamined,whichappearedtosuggest5factorsineachsurvey.Thetrace
function(Figure3)mayindicatesomeambiguityinfactornumberchoice,achievingaminimum
at13factors,butonlymodestslopebelowandabovethisnumber–muchlessthaninother
simulatedscenarios(seeappendix).Weexamineda13-factorsolutionforthebaselineCRSEFA,
extractedusingthesameOLSmethodandobliminrotationasinthe5factorEFAs,inorderto
examinethequalitativedifferencesdrivenbythedifferencesinthenumberofextracted
factors:Twofactors’interpretationswereinvariant--thefacialpainandpressuresymptomand
smelllosssymptomfactors;otherfactorshowever,werereducedtoidentifyingsymptoms
relatedtosinglephenomenaororgans(nasalcongestion,ear,eye,fatigue,etc.)andaddressing
bothpresenceandseverity(whenpresentinthesurveys).
ThesubstantiveCRSfactoranalysiswasnotapproachedfromadogmaticfactormodel
vantagepoint,butratheraimedtoidentifybiologicallyfeasiblesymptomclusterswithinthe
19
questionnaireEFAs.Assuch,screeplotsandparallelanalysiswereprioritized,andinaddition
wasviewedasbeingparsimoniouscomparedtoothermethods.Althoughparallelanalysisdid
notsuggestthesamenumberoffactorsforeachofthequestionnaires,itwasdecidedtodoso
forinterpretabilityandcomparisonreasons.FurtherelaborationisprovidedintheCRSEFA
chapter.
Discussion
Theresultsofoursimulationstudyprovidesomeinsightintotheefficacyofthe
proposedmethodaswellastheeffectivenessofother,commonlyutilizedmethods
incorporatedintothisstudy(K1,parallelanalysis,SSBIC,BIC).Itwasshownthat,withincreasing
samplesizes,thetracefunctionmethodperformedprogressivelybetter,eclipsingthe
performanceofothermethodsinmanytestedcircumstances(Table1).Interestingly,theother
methodsdidnotappeartovaryinperformancewithincreased𝑁acrossthemajorityoftested
circumstances(Table1).
Inthesesimulatedmatrices,thetracefunctionmethodoutperformedthethree
standardmethodswhenthestrengthoftheunderlyingfactorswassmall(Table1).The𝑁
requiredtoattainsuperiorperformancevaried,butseemedtobelessforweakercorrelation
structures(Table1).Forstrongercorrelationstructures,eachofthestandardmethodstested
producedconsistentlyveryaccurateresults(Table1).InallthreeCRSimpliedcorrelation
matrices,thesamplesizeadjustedBICaswellastheeigenvaluegreaterthan1method
achievedperfectaccuracyacrossallvaluesof𝑁,whilethestandardBICmeasureperformed
20
verypoorly,withaccuraciesconstantlybelow10%.Althoughthetracefunctiondidnotattain
100%accuracyacrossallN,itimprovedfrom61,46and34%accuracyat𝑁 = 100to100,97,
and99%accuracyat𝑁 = 1000forbaseline,6month,and16monthCRSmatricesrespectively
(Table1).
TherehavebeenmanymethodsdevelopedtoselectthenumberoffactorsinanEFA
setting,allwithslightlydifferentinterpretations,strengthsandweaknesses.Frequently,
practitionersconductingEFAtreatthefactormodelliterallyandstrivetouncoverthe'true'𝑚
andevaluatetheirrespectiveloadingsaccordingly(Preacheretal.,2013).Webelievethisgoal
oftenisunclear,andpotentiallyunhelpful,fortworeasons.First,inpractice,researchersapply
severaldifferentcriteriatothetargettheyseek,includingaliteralnumberoffactorsunderlying
theobserveddata,themostinterpretablenumberoffactors,andvariousothercriteria.
Second,similarlytolinearregressions,modelsareonlyapproximate-withthatinmindwewill
understandthatwewillneverobserve'truth',onlyestimationsandapproximations(Preacher
etal.,2013).Assuch,afactor,onceidentified,isnotnecessarilyrealbutatbestauseful
approximationtoreal(biologicalorotherwise)phenomenon.
Searchingfortheliteralnumberoffactorsisarguablyinfeasible,outsideofextremely
controlledsettings,becauseobservedvariableslikelyareaffectedbyahugenumberof
contributing'factors'(latentorotherwise).Considerthenumberoffactorsunderlyingan
individual’stakehomeincome.Therearelikely'strong'factorsincluding:age,education,work
ethicandindustry.Buttherealsoisaprofusionof‘weak'factorssuchas,appearance,senseof
style,andvoicequalitywhichmayberelativelyunmeasurablebutcoulddrivestarkdifferences
inincomeamount.Notwithstandingthatfactormodelstypicallymustgreatlysimplifydata
21
interrelationships,theestimationoffactorpresenceandidentitycanbeanextremelypowerful
tool,providinginsightintodiseasepathologiesandsymptomclassifications.Inthemotivating
study,EFAwasusedtobetterunderstandtherelationshipsbetweenCRSsymptomsandlatent
factors.Itwashypothesizedthatthesefactorsreflectedpathobiologicalphenomena,shedding
lightontowhatsymptomquestionresponsesaretrulymeasuringinthispopulation.
Asseeninthisstudy’sapplicationtotheCRSdata,differingmethodscouldpotentially
estimateawiderangeoffactors.Wehypothesizethatthisdiscrepancyreflectsthediffering
objectivefunctionsemployedbythevariousmethods.TheK1andparallelmethodsare
designedtoestimatethedimensionalityneededtorepresentsharedcovariationamongone’s
items,withoutexplicitreferencetoafactormodel.BICandthetracefunctionbothexplicitly
incorporatethefactormodelspecificationindeterminingfittotheempiricalcovariance
matrix—henceaddressFAassumptionsinadditiontodimensionality.Itmaynotthenbe
surprisingthatthesemethodsrequireahigherdimensionalitytoreproducetheempiricaldata
structure.Weconsiderthesensitivityofthedimensionalitychoicetothefactormodel
assumptionstobeinstructiveinthepresentcase.IntheCRSapplication,simpledimensionality
wasofinterest,ratherthanconsistencywithafactormodelperse,makingprioritizationof
thosemethodstunedtothisaswellasabiologicallymeaningfulinterpretationappropriate.
Whenextractingthe13factorsconsistentwiththetracemethodfromthebaselineCRSstudy,
moreover,themajorityoffactorsextractedappearedas‘singlesymptom’factors,addressing
singlephenomenaororgans.Theunderstandingofthesefactorsisnotparticularlyinteresting
fromabiologicalormedicalstandpoint,butmayrathersuggest‘latent’constructsassociated
withpatientresponses(similarsymptomquestionsareansweredsimilarly),whileother
22
methodssuchasparallelanalysisprovideamoredesirableunderstandingoflatentconstructs
associatedwithactualpatientsymptoms.
Sincethe'true'numberoffactorsmayhavequestionablemeaninginsome
circumstances,somehavearguedthatsearchingforthe'optimal'numberoffactorsmaybethe
bestcourseofaction,basedonacriterioncenteredaroundachievingaspecificgoalsuchas
verisimilitudeorappearanceofreasonabletruth(Preacheretal.,2013).Othercriteriafor
“optimal”numbersoffactorsmayincludegeneralizability,theabilitytoattainsimilarresultson
anindependentdatacollectedfromthesamepopulation(Myung,2000),oraccurateand
precisedataapproximation(Owen&Wang,2016).
Themethodpresentedhereseekstoapproximateanunderlyingnumberoffactors
withoutforgoinggeneralizability.AfrequentproblemintheEFAframeworkisthatEFAsfrom
onesamplemaynotnecessarilymatchanotherEFAcarriedoutonanindependentsamplefrom
thesamepopulation.Byincorporatingandembracingtheideaofcrossvalidationand
generalizabilityfromthebeginning,thereisapossibilitythatcrossvalidatedapproachesmay
removevariabilityinthechoiceof𝑚resultinginmoreconsistentresultsacrossstudies
(Friedman,Hastie,&Tibshirani,2001).Theseconjectureswarrantfurtherresearchasthereare
manycasesandtypesofdatainwhichEFAframeworkswouldbeutilized.
AninterestingobservationfromoursimulationsisthatstandardmethodsSSBIC,BIC,
andK1,seemtoproducethesameerrorrateregardlessof𝑁(Table1).Thisempiricalresult
suggeststhatthesemethodsmaynotbeconsistentestimatorsof𝑚inthesetestedsettings.
Thetracefunctionontheotherhandnearlymonotonicallyincreasedinaccuracywith
23
increasing𝑁(Table1).Inaddition,byutilizingah-foldcrossvalidatedprocedure,accuracymay
increasemoresteeplyasafunctionofNasℎ > 2hasbeenshowntoimproveestimation
accuracyandconvergence(Friedmanetal.,2001).Severalstudieshavesoughttoshow
consistencyintheEFAframework,astraditionalmethodssuchasBICmaynotbeconsistentfor
𝑚inallsettings(Bai&Ng,2002).Utilizingagreaterrangeof𝑁heremayenableustoshow
someempiricalconsistencywithsomeofthemethodsused,althoughonlythetracefunction
methodappearedtoapproachthisacrossavarietyofsampleandcorrelationstructuresettings.
Futurework
Oursimulationstudyprovidessomeinsightintotheperformanceofourcrossvalidated
discrepancyapproachtoEFAmodelselection.However,onlyasmallsamplingofpossible
correlationstructureswereutilized,allwithsimilartrue𝑚(𝑚 = 5or𝑚 = 10).Althoughwe
hypothesizethatthesignofanyloadingwillnotchangetheaccuracyofanyofthemethods
utilized,itisimportanttonoteallofthesimulatedmatricescontainedonlypositivefactor
loadings.
Expandingtheproposedmethodtoimplementh-foldcrossvalidationisalogicalnext
step.Currently,thedataissplitintotwoandmischosentobethenumberoffactorswhich
minimizesthediscrepancybetweenfitmodelandheldoutdata.Typically,allowingforan
increasednumberofdatapartitionsleadstomoreaccurateestimatesoftheerrorrate,andin
thiscase,mayproducemorestableestimatesof𝑚potentiallyatlowervaluesof𝑁aswell
(Friedmanetal.,2001).
24
Additionally,itisimportanttoassesstheestimationbiasinallmethodsdiscussed.
Currently,accuracywasassessedonlyaswhetherornotthecorrectnumberoffactorswas
identified.Consideringtheimportanceofunderstandingamethod’stendencytooverorunder
factor,weplanalsotoassessthemagnitudeanddirectionofeachmethod’smisses.Ifover
factoringis,generally,betterthanunderfactoringforinstance,wemaywishtopenalize
overestimationof𝑚lessthananyunderestimation.Inaddition,missingthenumberoffactors
by1islikelylessdeleteriousthanby4,forexample,sothemagnitudebywhichanygiven
methodmis-estimatedshouldbeconsideredinfuturework.
25
TablesandFiguresTable1.NumberofcorrectfactornumberassessmentsbysamplesizeadjustedBIC(SSBIC),standardBIC(BIC),Kaisereigenvaluesgreaterthan1rule(K1),parallelanalysis(PA),andtheproposedmethod(trace)outof100simulationreplicates.
Simulatedsamples
CorrelationStructure
SSBIC BIC K1 PA Trace
100 Strong 100 98 100 100 79
300 Strong 100 99 100 100 97
500 Strong 100 99 100 100 99
700 Strong 100 98 100 100 97
1000 Strong 100 98 100 100 99
100 Moderate 25 0 92 99 5
300 Moderate 20 0 83 100 69
500 Moderate 23 0 81 100 98
700 Moderate 16 0 90 99 98
1000 Moderate 24 0 86 100 100
100 Weak 2 0 38 75 0
300 Weak 0 0 34 89 30
500 Weak 0 0 44 79 72
700 Weak 1 0 40 82 89
1000 Weak 0 0 33 82 97
100 Moderate,lowdim 0 0 0 0 5
300 Moderate,lowdim 0 0 0 0 8
500 Moderate,lowdim 0 0 0 0 19
700 Moderate,lowdim 0 0 0 0 13
1000 Moderate,lowdim 0 0 0 0 35
100 Moderate,difdim 0 0 16 0 2
300 Moderate,difdim 0 0 14 0 11
500 Moderate,difdim 0 0 11 0 41
700 Moderate,difdim 0 0 14 0 54
1000 Moderate,difdim 0 0 14 0 75
100 CRSibl 100 7 100 100 61
300 CRSibl 100 5 100 100 100
500 CRSibl 100 7 100 99 100
26
700 CRSibl 100 3 100 100 100
1000 CRSibl 100 9 100 99 100
100 CRSi6m 100 0 100 99 46
300 CRSi6m 100 0 100 100 94
500 CRSi6m 100 0 100 98 98
700 CRSi6m 100 0 100 97 95
1000 CRSi6m 100 0 100 99 97
100 CRSi16m 100 0 100 95 34
300 CRSi16m 100 0 100 95 96
500 CRSi16m 100 0 100 91 98
700 CRSi16m 100 0 100 87 96
1000 CRSi16m 100 0 100 91 99
100 10factor 100 0 100 95 34
300 10factor 100 0 100 95 96
500 10factor 100 0 100 91 98
700 10factor 100 0 100 87 96
1000 10factor 100 0 100 91 99
27
Table2Estimatednumberoffactorsforeachofthe3questionnaires(baseline,6month,and16monthfollowups)fromcommonlyutilizedmethodsincludingKaisereigenvaluesgreaterthan1rule(K1),parallelanalysis(PA),standardBIC(BIC),empiricalBIC(EBIC),samplesizeadjustedBIC(SSBIC),andtheproposedmethod(TRACE).
Method Baseline 6-monthfollowup 16-monthfollowupK1 6 5 7PA 5 5 6BIC 15 16 14EBIC 8 8 8SSBIC 17 17 20TRACE 13 13 16
28
Figure1.Strong“blocked”factorloadingsutilizedtocreatethecorrelationmatrixinthesimulationstudy.
29
Figure2.Correlationmatrixgeneratedfromthestrong“blocked”factorloadingsmatrix.
30
Figure3Tracefunction’sdiscrepancyvaluesonthebaselineCRSdata.Verticallinedenotestheminimumachievedat13factors.
31
Appendix
AdditionalFigures:
FigureA1.Theutilizedstrongloadingmatrix,createdusingtheDirichletsimulationprocess,consistingof5factorsand25variables.
32
FigureA2.Theutilizedstrongcorrelationmatrix,createdfromthecorrespondingstrongloadingmatrix
33
FigureA3.Theutilizedmoderateloadingmatrix,createdusingtheDirichletsimulationprocess,consistingof5factorsand25variables.
34
FigureA4.Theutilizedmoderatecorrelationmatrix,createdfromthecorrespondingmoderateloadingmatrix
35
FigureA5.Theutilizedweakloadingmatrix,createdusingtheDirichletsimulationprocess,correspondingfrom5factorsand25variables.
36
FigureA6.Theutilizedweakcorrelationmatrix,createdfromthecorrespondingweakloadingmatrix
37
FigureA7.Theutilizedmoderate/lowdimensionalloadingmatrix,createdusingtheDirichletsimulationprocess,consistingof5factorsand11variables.
38
FigureA8.Theutilizedmoderate/lowdimensionalcorrelationmatrix,createdfromthecorrespondingmoderate/lowdimensionalloadingmatrix
39
FigureA9.Theutilizedmoderate/differentdimensionloadingmatrix,createdusingtheDirichletsimulationprocess,consistingof5factorsand27variables.
40
FigureA10.Theutilizedmoderate/differentdimensionalcorrelationmatrix,createdfromthecorrespondingmoderate/differentdimensionalloadingmatrix
41
FigureA11.Theutilizedtenfactorloadingmatrix,createdusingtheDirichletsimulationprocess,consistingof10factorsand100variables.
42
FigureA12.Theutilizedtenfactorcorrelationmatrix,createdfromthecorrespondingtenfactorloadingmatrix
43
FigureA13.Tracefunction’sdiscrepancyvaluesonthestrong“blocked”factorloadingmatrix.Verticallinedenotestheminimumachievedat5factors.
44
SimulatingFactorModelCorrelationMatrices
Motivation
Insimulationsoffactoranalyses,itisimportanttobeabletorandomlygeneratevalid
correlationmatriceswhichstemfromsomeknownfactormodelinordertoassessmodel
selectionmethodsandotherattributesoffactoranalysisprocedures.
Computation
Acoreideaoffactoranalysisisthatwecanexplainvariabilityinourobserveddataby
meansofasmallernumberofunderlying,latentfactors,whichareassociatedwithobserved
variables.Mathematicallyspeaking,ourcorrelationmatrix,𝜌,canbebrokendownassuch:
𝜌 = 𝛬𝛹𝛬′ + 𝛥?
where𝛬isthe𝑝×𝑚matrixoffactorloadings,𝛹isthe(𝑚×𝑚)factorcorrelationmatrix,and
𝛥?isthematrixofuniquevariances(𝑝×𝑝diagonalmatrix).
Thiscanbefurtherwrittenas𝜌 = 𝛬𝛹𝛬′ + (𝐼 − diag(𝛬𝛹𝛬′))
Ifweareinterestedingeneratingarandom,structuredloadingsmatrix,thefollowing
procedureisproposed.Wecantreateachrow𝑖of𝛬asaDirichlet(𝛼#,,, . . . , 𝛼#,*)×Beta(𝑥#, 𝑦#)
whereeach𝛼#,% issomeproposedweightastohowstrongwewouldlike(onaverage)variable𝑖
toloadoneachfactor.
Theseconstraintsensurethat𝛬willbeavalidloadingsmatrixas:
45
• Allloadingsarebetween-1and1
• Thesumofsquaredloadings(foravariable)islessthan1(avoidingaHaywoodcase)
Now,wewilllet𝜓bethe𝑚dimensionalidentitymatrix(allfactorsareorthogonal).
𝜌 = 𝛬𝛹𝛬′ + (𝐼 − diag(𝛬𝛹𝛬′))
= 𝛬𝛬′ + (𝐼 − diag(𝛬𝛬′))
Weknow𝛬𝛬′ispositivesemi-definiteaseachelementof𝛬isarealnumber.
Wealsoknow,apositivesemi-definitematrixplusamatrixofthesamedimensionwith
allnon-negativeentriesisalsopositivesemi-definite.Itfollowsthat𝜌ispositivesemi-definite.
Inaddition,becauseeachentryof𝛬isin[0,1],weknoweachelementof𝛬𝛬′willbebetween
[0,1]whilethe+(𝐼 − diag(𝛬𝛬′))termensuresthatthediagonalof𝜌areall1.Thus,𝜌isavalid
correlationmatrix,uniquelydeterminedby𝛬.
Use
Thisideaofbeingabletoconstructrandomcorrelationmatricesisimportanttocertain
simulationstudieswhereonemustgeneraterandomyetvalidcorrelationmatricesinwhichthe
truenumberoffactorsisknownandfixed.Becauseanycorrelationmatrixstemmingfromthis
methodwouldinherentlybeabletoperfectlydecomposeintothetruenumberoffactors,
samplingnoiseshouldbeadded.Thiscanbeaccomplishedbysamplingfromamultivariate
distribution(inthiscasemultivariatenormal)withthespecifiedcorrelationmatrix,them
computinga'simulatedempirical'correlationmatrixfromthesimulatedmultivariatedata.
46
Furtherwork
Theabovemethodcanproducemanytypesofrandomcorrelationmatrices;however,
allloadingsmustbepositive,whichisnotrequiredoffactoranalysisloadings.Wemaybeable
toincorporatenegativeloadingsbyutilizingaBernoulliprocessbywhicheachcellof𝛬has
someprobabilityofbeingmultipliedby-1or1.Thiswouldallowfornegativeloadings(and
correlations)whileensuring𝜌remainsavalidcorrelationmatrix.
47
Chapter3-ExploratoryFactorAnalysisofCRSSymptoms
Introduction
Chronicrhinosinusitis(CRS)isaninflammatoryconditioncharacterizedbynasaland
sinussymptoms,affecting15%oftheUnitedStatespopulation(Wjetal.,2012).Thereare
consideredtobefourcardinalsymptomsofthediseasewhichincludenasaldrainage(anterior
orposterior),nasalblockage(congestion),smellloss,andfacialpainorpressurelastingfor12
ormoreweeks(Browne,Hopkins,Slack,&Cano,2007;Tan,Kern,Schleimer,&Schwartz,2013).
TheEuropeanPositionPaperonRhinosinusitisandNasalPolyps(EPOS)diagnosismethodology
forCRSisbaseduponthepresenceofnasalobstructionordischargeandatleastoneother
symptomaswellasobjectiveevidenceofinflammationonsinuscomputerizedtomography(CT)
scanorsinusendoscopy,whichmayincludesinusorosteomeatalcomplexmucosalchanges,
presenceofnasalpolyps,ormucopurulentdischargefromthemiddlemeatus(Wjetal.,2012).
BecauseofthedifficultyofobtainingsinusCTorendoscopyinlarge-scalepopulationstudies,
EPOSalsohasanepidemiologicdefinitionofCRSbasedonsymptomsanddurationonly.
However,EPOSdoesnotspecifyhowtomeasuresymptomsintermsofseverity(e.g.,some
blockageorcompleteblockage;partialsmelllossorcompletesmellloss;thequantityof
discharge;theseverityofpain)orfrequency(e.g.,someofthetime,mostofthetime,orallof
thetime).
Nasalandsinussymptomslastingthreemonthsarequitecommon,andmanystudies
havereportedthatthereisnotastrongcorrelationbetweensuchsymptomsandobjective
48
opacificationonsinusCTscans(Browneetal.,2007;Ferguson,Narita,Yu,Wagener,&
Gwaltney,2012;Wjetal.,2012).Upto40%ofthosewithsymptomsmeetingEPOScriteriafor
CRSdonothavesignificantsinusopacificationonCT(Fergusonetal.,2012).Thelackof
correlationofsymptomsmeetingEPOScriteriaforCRSandfindingsonsinusCTcouldbedueto
imprecisioninthewaysthatnasalandsinussymptomshavebeenmeasuredintermsof
severity,frequency,andduration(Hamilos,2011;Wjetal.,2012).Inaddition,therearefew
studiesthatexaminehownasal,sinus,andotherrelevantsymptomsrelatetooneanother
withinpatientscross-sectionallyorlongitudinally.Understandingtheserelationshipsamong
symptomsmayguidemoreprecisesymptommeasurementinwaysthatincreasethelikelihood
thatpatientswithcertainnasalandsinussymptomsalsohaveobjectiveevidenceof
opacification.
Weusedexploratoryfactoranalysis(EFA)toassessthelatentstructureofnasal,sinus
andothercommon,relevantsymptomsatcross-sectionforthreeseparatetimepoints,aswell
asthechangeinthesesymptomsovertime.Bylatentstructureofsymptoms,wemeanthe
otherwiseunseenpatientattributesdrivingthemanifestationofsymptoms.Whilepriorstudies
haveusedEFAappliedtoCRSsymptomsatonepointintime,theyhaveutilizedtheSino-nasal
OutcomeTest(SNOT)familyofquestionnaires,designedtoassesstreatmenteffectiveness
amongpatientsknowntohaveCRS(Browneetal.,2007;ClaireHopkins,Browne,Slack,Lund,&
Brown,2007).SNOTassessessymptomseverityinatwo-weekrecallwindow,socannotbe
usedtoevaluatecompliancewithEPOSdurationcriteria,anddoesnotevaluatesymptom
frequency(ClaireHopkinsetal.,2007;Wjetal.,2012).Thequestionnaireutilizedinthisstudy
incorporatedquestionsassessingfrequencyofEPOSdefinedsymptoms,aswellasfrequencyof
49
severeandrelatedsymptoms,inordertoassessabroadrangeofpotentiallyCRS-associated
manifestations.Understandinghowsymptomsmaygroupatonepointintimeandchangeover
timecouldallowdevelopmentofmorepreciseapproachestosymptommeasurement;andalso
allowdevelopmentofdifferentbiologicrationalesforhowthesesymptomsmaygrouptheway
theydo.
Methods
Studypopulationanddesign
Atotalof200,769GeisingerClinicprimarycarepatientsovertheageof18yearswith
bothelectronichealthrecord(EHR)andrace/ethnicitydatawereeligibleforparticipationin
thisstudy.Fromthesepatients23,700werechosentoberecipientsofaseriesof
questionnairesutilizingasamplingschemethathasbeenpreviouslydescried(Hirschetal.,
2017;Tustinetal.,2017).Inbrief,usingastratifiedrandomsamplingmethodtoover-sample
bothracial/ethnicminoritiesaswellasthosewithhigherlikelihoodsofCRSusingInternational
ClassificationofDiseases(ICD-9)andCurrentProceduralTerminologycodesinEHRdata,
patientswereselectedtoreceiveself-administeredquestionnairesthroughthemail(Hirschet
al.,2017;Tustinetal.,2017).
Participantswhoreturnedthebaselinequestionnaire(n=7847)werefollowedfor16
months,fromApril2014toAugust2015,andreceivedtwoadditionalquestionnairesatsix
monthsand16months.Non-respondersweresentquestionnairesoneortwoadditionaltimes.
Thequestionnaireswerediverseintermsofinformationrequested,providinginformation
50
aboutaspectrumofsymptomsincludingpresence,frequency,severity,andbotherofarange
ofsymptomsassociatedwithCRSandco-morbidconditionslikeheadachedisordersand
asthma(Table1,Hirschetal.,2017;Tustinetal.,2017).Eachquestionnaireincluded37
commonquestions,eachwiththesameresponseoptions(howoftenthesymptomoccurredin
thepastthreemonthsas1=never,2=onceinawhile,3=someofthetime,4=mostofthe
time,or5=allthetime;Table2).Atotalof21questionswereaboutthepresence,severity,
anddegreeofbotherofCRSnasalandsinussymptomswereincorporatedwhiletheremaining
questionsassessedpresenceoffourasthmasymptoms;fourallergysymptoms;threeear
symptoms;andfiveconstitutionalandotherrelatedsymptoms(Table2).
Datacollection
ThebaselinequestionnairewasmailedinApril2014,the6-monthfollow-upinOctober
2014,andthe16-monthfollow-upinAugust2015.Theseconsistedof94,87,and79questions
respectively,butthecurrentanalysisfocusedonthe37questionsthatwerecommontoall
three(Table2).Afterquestionnaireswerereceived,eachwasscannedandthendatawas
double-checkedandverified.Atotalof7834personsreturnedthebaselinequestionnaire
(respondingtoatleastoneofthe37questionsofinterest),4945returnedthe6-monthfollow-
upquestionnaire,and4584returnedthe16-monthfollow-upquestionnaire.
Skippatternswerepresentinthequestionnaires,bywhichpatientswouldbeaskedto
skipblocksofquestionsiftheresponsestothesequestionscouldbecompletelydeterminedby
previousresponses.Thisoccurredonlywhenpatientsindicatedthattheyhadnotexperienced
thesymptom(s)ofinterest,makingfurtherquestionspertainingtothatsymptomirrelevant.
51
Theseskippatternswereaccountedforbyfillinginimpliedresponseswhenskip-pattern
missingnesswaspresent.
Analyticvariables
TheEuropeanPositionPaperonRhinosinusitisandNasalPolypssubjective(EPOSs)
criteriawereusedtoclassifypatientsascurrent,previous,orneverCRSbasedonpatient
reportedsymptomsfromonlythebaselinequestionnaire.EPOSscriteriarequirethreemonths
ofobstructionoranteriororposteriordischargewithoneotherofthecardinalsymptomsof
smellloss,facialpain,orfacialpressure,lastingthreeormoremonths.Patientswereclassified
usingquestionnaireresponses,specificallylifetimeandprevious3monthsofsymptoms,being
labeledascurrentCRS,iftheymetEPOSsCRScriteriainthethreemonthsbeforethebaseline
questionnaire;aspastCRSiftheymetthesecriteriaintheirlifetimebutnotinthethree
monthsbeforethebaselinequestionnaire;andneverCRSiftheynevermetthesecriteriain
theirlifetime.Thequestionnairehasbeenpreviouslydescribed,fromtheChronicRhinosinusitis
IntegrativeStudiesProgram(Hirschetal.,2017;Wjetal.,2012),andincludedincomeand
educationinformationatbaseline.Otherdemographiccharacteristicsincludingage,sex,and
race/ethnicity,aswellashealthinformationsuchasbodymassindex(BMI,measuredin
kg/m2),werecollectedviaelectronichealthrecorddata.
StatisticalAnalysis
Thegoalsoftheanalysisweretoidentifytheunderlyingstructure,ifpresent,amongthe
37surveyquestionsateachquestionnairetimepoint,andthenamongthechangeinthese
52
symptomsovertime,frombaselineto6-monthfollow-upandfrom6-monthfollow-upto16-
monthfollow-upquestionnaires.Ofthe7847patientswhoreturnedthebaseline
questionnaire,theanalysisincludedthe3535patientswhoreturnedallthreequestionnaires
withnomorethan5missingvaluesforthe37questionsforanysinglequestionnaire.Wedid
notwanttoimputevaluesforsubjectswithmanymissingquestionssincetheprimarygoalof
theanalysiswastoevaluatetheunderlyinglatentstructureofthepatternsofsymptom
reporting,andalargeportionofpatientswereonlymissingasmallnumberofresponses.For
missingnessforsubjectswithfiveorfewermissingvalues,whichweassumedtobeatrandom,
multivariateimputationbychainedequationswasconductedtoimputemissingvaluesfor
patientquestionnairesthatwereincludedinthisstudy(3.5%),utilizingonlyinformationwithin
eachsurvey.ThisimputationwascarriedoutviathemiceRpackageusingthepredictivemean
matchingmethod(Buuren&Groothuis-Oudshoorn,2011).Oncedatawerefinalizedforeach
patientquestionnaire,twochangescoreswerecalculatedasthedifferencebetweeneach
person’sadjacentquestionnaires(baselineto6-monthand6-monthto16-month).
Duetotheexclusioncriteriautilizedinthisstudy,notallpatientswereincludedinthe
finalanalysis.Summarystatisticsofdemographic,health,andsocioeconomicinformationwas
computedandcomparedbetweentheincludedindividualsinthisanalysisandthosewhowere
excluded.Inaddition,lasagnaplotswereexaminedinordertovisuallyassessthetransitions
betweenindividualquestionresponsesoverthe3questionnairedurationofthestudyperiod
(Figure1,Swihartetal.,2010).
Exploratoryfactoranalysiswasutilizedasthereweremultiplehypothesesandlittlea
prioriknowledgeoftheunderlyingstructureofsymptomreporting.Recognizingtheordinal
53
scalingofthedata,impliedPearson(polychoric)correlationswereestimatedamongthe37
questionsforthethreecross-sectionalquestionnaires,usingthequicktwostepprocedureas
implementedbythepsychRpackage(Revelle,2017).Thesecorrelationswerethenutilizedin
exploratoryfactoranalyseswithordinalvariables.Meanwhile,Pearsoncorrelationmatrices
werecalculatedforeachofthetwochangescoresasthedifferencescoredistributionappeared
symmetricandcontainedmorelevelsthanpracticalforpolychoriccorrelations.
ForeachofthefiveEFAs(threecross-sectionalandtwodifferences),afactoranalysis
wasconductedfittingloadingsestimatesandcommunalitiesapplyingtheordinary
(unweighted)leastsquares(OLS/ULS)proceduretocorrelationsestimatedasjustdescribed.An
obliminrotationforeachfactoranalysiswasutilizedinordertoallowforcorrelationsamong
factors(Revelle,2017).InEFAsettings,determiningthenumberoffactorsisakeystepin
identifyingfactorstructure.Commonly,manymethodsareutilizedinordertoassesswhich
selectionismostappropriate,eachwithdifferentoptimalitycriteriadrivingdifferent
interpretationsofresults.Inthisstudy,biologicalinterpretabilityandparsimonywerestressed,
inaccordancewithanalysesandconsiderationsprovidedinthepreviouschapter,the
qualitativemethodofexaminingscreeplotsandthequantitativeparallelanalysismethodwere
takentogethertodeterminingtheoptimalnumberoffactorstoextract.Thescreeplotdisplays
eigenvaluesofthecorrelationmatrixinrankorderbysizefromlargesttosmallest(x-axis=size
rank,y-axis=eigenvalues)toassessthelocationofaclear“elbow”shapewheretheslopeof
thecurvechangedfromrapiddeclineineigenvalueswithincreasingranktoaflatteningofthe
curve,asperCattell’sScreetest(Cattell,1966).Meanwhile,parallelanalysiscompares
eigenvaluesfromrandomdatamatriceswithuncorrelateditemresponseswithobserved
54
eigenvalues:thenumberofrankedobservedeigenvaluesgreaterthantherandomlygenerated
onesisthenumberoffactorsretained(Humphreys&Jr,1975).Oncefactorloadingswere
extracted,factorscoreswereestimatedforeachidentifiedfactorforeachpatientusingitem
responsetheory(IRT)basedscoresforpolytomousitemsforeachofthethreesurveys(Kamata
&Bauer,2008).Theseestimatedfactorscoreswerecomputedasameasureofthestrengthof
eachlatentfactorforeachpatient.TheseestimatedfactorscoreswerecomparedacrossEPOSs
CRSstatusgroups(current,previous,never).Amultivariateanalysisofvariance(MANOVA)was
fitinordertocomparethemeanmultivariatefactorscore(vectoroftheestimatedfactor
scores)betweenEPOSsCRSstatusgroupsforfactorscoreswhichappearedtofollowan
approximatelynormaldistribution.Onefactorscorehadamixed-scaledistribution,witha
considerableproportionofindividualshavingalow(attheIRT-lowerbound)valueandthe
remainingindividualsdistributedrelativelycontinuouslyamonghighervalues.Torelatethis
scoretoEPOSstatus,alogisticregressionwascomputedtoestimatetheoddsofhavingalow
factorscoreasafunctionofEPOSsCRSgroup,andalinearregressionwasusedtoestimatethe
meanfactorscoresbyEPOSsCRSgroupforthosepatientswhodidnothavethelowerbound
factorscore.
Wealsosoughttoassesswhetherornotthebaseline–6monthdifferencecaptured
morevariabilitythanthe6month–16monthdifference.Tothisend,eachdifferenceEFA
communalitywasextractedandthencomparedbytimeperiod(baseline-6monthversus6
month-16month),usingaWilcoxonsignedranktest.Wehypothesizedhighermean
communalityvaluesinthebaseline–6monthperiodEFAthanthe6month–16monthperiod,
correspondingtohigherstabilityoverashorterperiodforchange.
55
SensitivityAnalysisandDiagnostics
Diagnostics
Kaiser-Meyer-Olkin(KMO)factoradequacywasevaluatedforeachcomputed
correlationmatrixtofurtherassesstheappropriatenessoffactoranalysis.KMOmeansquare
error(MSA)statisticsof0.96,0.96,0.95wereobservedforbaseline,6-monthfollow-up,and
16-monthfollow-upquestionnaires,respectively.Thetwochangescoredifferencecorrelation
matricesyieldedKMOMSAsof0.91and0.90forthefirstandseconddifferences,respectively.
TheseKMOstatistics,allofwhichweregreaterthan0.9,indicatedaveryhighdegreeof
commonvariance,andsupportedaconclusionthatourcovariancematriceswereverywell
suitedtobesubjectedtofactoranalysis.
Sensitivitytofactoringmethod
Eachfactoranalysiswasrefitusingweightedleastsquares,principalfactors,maximum
likelihood,andgeneralizedleastsquarestoensurethequalitativeinterpretationoftheloadings
wasnotconditionalonthefactoringmethod.Weselectedordinaryleastsquaresasthefinal
factoringmethodasOLSproducesunbiasedrotatedfactorloadings,andhasdesirable
characterizesatlargesamplesizes(Lee,Zhang,&Edwards,2012).Loadingmatrices,
communalities,andinter-factorcorrelationmatriceswereexamined.Loadingsmatrices,which
maycontainentriesrangingfrom-1to1,provideameasureofthestrengthoftherelationship
betweeneachquestionandeachoftheextractedfactors,whilethecommonalitiesforeach
56
question,whichrangefrom0to1,areinterpretedasthefractionofhowmucheachquestion’s
variabilityisexplainedbytheutilizedfactormodel.Finally,inter-factorcorrelationmatrices
examinedeachfactor’srelationswiththeotherfactorsthatwerederivedfromthefinalEFA.
Imputation
Toevaluatethesensitivityofresultstomissingdataandimputation,atotalof100
imputeddatasetsweregeneratedfromtheoriginaldatasetwithmissingnessusingthesame
multipleimputationmethodology(mice)aspreviouslystated.Latentcontinuous(polychoric)
correlationswerecalculatedandcomparedacrossimputeddatasetsforeachquestionnaire
item.Eachofthese666(allofthebivariatecorrelationsamongtheresponsestothe37
questions)pairwisecorrelations’standarddeviationswerecomputedusingthe100imputed
datasets,andwereexamined.Acrossthesepairwisecorrelations,99.5%ofstandarddeviations
werebelow0.0064,0.0089,and0.0036forthebaseline,6-monthfollow-up,and16-month
follow-upquestionnaires,respectively,suggestingthattheimpactofrandomimputationon
correlationmatricesandsubsequentfactoranalyseswasminimal.
Results
Descriptionofstudysubjects
The3535patientsincludedintheanalysiswerefirstcomparedtothe4312respondents
ofthebaselinequestionnairewhowerenotincluded(Table1).Thetwogroupsweresimilaron
sexdistribution(37.8%vs.36.9%male,respectively)andmeanbodymassindex(BMI,30.0vs.
30.3kg/m2,respectively).However,includedandexcludedpatientsdifferedonanumberof
57
otherstudyvariables,includingage(57.5vs.53.2yearsonaverage,respectively),race/ethnicity
(94.0%vs.87.5%white,respectively),andsocioeconomicstatus(32.9%vs.25.1%earnedover
$50,000annually,respectively).
Itwasobservedthatacrosstime,individualsexperiencedvaryingdegreesofchanging
symptoms,asexemplifiedbyresponsestothequestion(number3)aboutthefrequencyofpost
nasaldripacrossvisits(Figure1).Althoughsymptomsatbaselinegenerallypredictedsymptoms
overtime,itwascommonforsymptomstochangebyonefrequencycategory,andsome
patientschangedbytwoormore.Thosewhoanswered“never”havingpostnasaldripinthe
previous3monthsatbaselinetypicallyrespondedhavinglowfrequency(neveroronceina
while)ofthesymptomat6monthsand16months(Figure1).Similarly,thosewhoresponded
experiencingthesymptom“allofthetime”atbaseline,weremorelikelytoexperiencethe
symptomoftenat6monthsand16months.Thispatternwasevidentinotherquestionsaswell
(resultsnotshown);dataaredisplayedforquestion3becauseithadarelativelyuniform
distributionofresponsesatbaseline(theotherquestionshadlargerproportionsofsubjects
whoreportedneverexperiencingthesymptom).
Cross-sectionalEFAs
Foreachofthethreecross-sectionalEFAs,screeplotresultssupportedtheextractionof
fivefactors(seeappendix).Parallelanalysissuggestedtheretentionoffivefactorsforbaseline
and6monthquestionnaires,andsixfactorsforthe16-monthfollowup.Forcomparability,5
factorswereextractedfromeachofthesequestionnaires.Eachofthestructuresinthesethree
EFAswassimilar,andthecontentofthefivefactorswasthesameforeach,withonefactor
58
eachforcongestionanddischargesymptoms,painandpressuresymptoms(including
headache),asthmaandconstitutionalsymptoms,earandeyesymptoms,andsmellloss(Table
3forbaselineEFA,othertwocross-sectionalEFAs,seeappendix).Factorloadings,orthedegree
towhichanyspecificquestionwasrelatedtoaspecificlatentfactor,wereconsistentacrossall
threequestionnaires(similartoresultsinTable3forbaseline,othercross-sectionalEFAresults
notshown).Mostobservedcommunalitieswerehigh,indicatingthatthefactormodelswell-
representedthesequestionswhichwereincludedintheanalysis(Table3).Afewlow
communalitieswereobservedforbadbreath(0.26),fever(0.34),cold/flusymptoms(0.37)and
fatigue(0.39),suggestingthatourfive-factormodeldidnotaccountformuchofthevariability
inthesesymptoms,andthustheydidnotloadheavilyonanysinglefactor.Afterusingthe
baselinemodeltoestimatefactorswithinindividualsatbaseline,theinter-factorcorrelations
resultingfromobliminrotationrangedfrom0.30to0.64(Figure2).
LongitudinaldifferenceEFAs
Analysisresultsalsosupportedfive-factormodelsforeachofthetwolongitudinal
differenceEFAs.Symptomsidentifiedtoloadonsinglefactorsinthedifferenceanalyses
indicatesthatthesesymptomschangetogetherandinthesamedirectionovertime.Notably,
thetwodifferenceEFAs(bothfor6and10monthdurations)yieldednearlyidenticalfactors
(Table4forchangefrom6-monthto16-monthquestionnaires,seeappendix)whichdisplayeda
fairlysimilarstructuretothefactorsidentifiedineachofthecross-sectionalEFAs(Table3).In
ordertocomparemodelfitbetweenthetwodifferenceEFAs,aWilcoxonsignedranktestwas
utilizedtotestthehypothesisthatthebaselineto6-monthdifferenceEFAexplainedmore
59
variabilitythanthe6to16-monthdifferenceEFAbyexaminingthedifferenceinindividual
variablecommonalitiesbetweenthetwoEFAs.Thebaselineto6monthdifferenceEFAhada
significantlygreateraveragecommunalitiesthanthe6monthto16monthdifferenceEFA(p-
value=0.002;seeappendix).
FactorScores
UtilizingthefactorsfromthebaselinequestionnaireEFA,factorscoreswereestimated
andcomparedbetweenEPOSsCRSgroups(current,previous,andneveratbaseline;Figure4
forfactor1,resultsforotherfactorsnotshown).AMANOVAwasfit,comparingthefourfactor
scoresinthethreeCRSstatusgroupssimultaneously(omittingthe4thfactor,smellloss,asit
appearedtobenon-normallydistributed),andthisindicatedasignificantdifferencebetween
meanfactorscoresbetweengroups(p-value<0.001;Figure3).Weobservedfactorscores
werehigher,indescendingorder,forcurrent,past,thennever.WealsoobservedthatCRS
Factorscoresshowedaweaker,butsimilarstructureasindividualquestionnaireresponses,
withlowfactorscorevaluescorrelatingmorestronglywithlowscoresorresponseinthe
followingsurveyandviceversa(Figure2,4).Utilizingthefactorscoresfromthe4thfactor(smell
loss),tworegressionswereconducted,alogisticregressionestimatingtheoddsofhavinga
lowerboundedfactorscore(-4)asafunctionofEPOSsCRSstatus,andalinearregression
comparingaveragefactorscoresforthosewithfactorscoresabove-4asafunctionofEPOSs
CRSstatus.ThosewithEPOSsCRScurrentatbaselinehadanoddsratioofexperiencinglower
boundfactorscoresof0.05(95%CI:0.04,0.06)comparedwithEPOSsnever,andEPOSs
previoushadanoddsratioofexperiencinglowerboundedfactor4factorscoresof0.15(95%
60
CI:0.12,0.18)comparedwiththeEPOSsnevergroup.Forthoseabovethelowerbounded
factorscore,alinearregressionwasconducted.ThoseatEPOSsCRScurrenthadafactor4
score0.48higherthanthoseatEPOSsCRSneveratbaseline(95%CI:0.35,0.60)whilethoseat
EPOSsCRSprevioushadafactor4factorscoreof0.22higherthanthoseatEPOSsCRSneverat
baseline(95%CI:0.1,0.34).
Discussion
Exploratoryfactoranalysiswasconductedasameasurementexercisetobetter
understandtherelationshipandcategorizationofnasalandsinus,asthma,headache,
constitutional,allergy,andearsymptomsutilizingbothcrosssectionalsymptomquestionnaire
responsesandchangesinsymptomresponsesovertime.AllfiveEFAspresentedconsistent
findingsoffiveunderlyingfactorsthatwereidentifiableascongestionanddischarge,painand
pressure,asthmaandconstitutional,earandeye,andsmelllossfactors.ThebaselineEFAwas
usedtoestimatefivefactorscoreswithinsubjects,andallfivewerehigherinsubjectswhomet
EPOSscurrentCRScriteriaandlowestinthosewhometEPOSsneverCRScriteria.The37
questionsutilizedinthisanalysisweredevelopedtocaptureawiderangeofoverlapping
symptomsthatoccurinseveralco-morbidconditions;andtoevaluateboththefrequencyand
severityofsymptoms.Understandinghowsymptomsclusterwithinvisitandacrossvisitscan
provideusefulinformationthatcanaidclinicalpractice,informsymptommeasurementinCRS
patients,andleadtohypothesesaboutthepathobiologyunderlyingthesesymptoms.
WehypothesizedseveralpatternsofresultsintheEFAs.Onetheonehand,ifthereisan
underlyingconstructofCRSthatcanbemeasuredwithsixcardinalsymptomsthataremainly
61
interchangeable,astheEPOScriteriasuggest,thenthesecardinalCRSsymptomscouldbe
expectedtoloadonasinglefactor.Ontheotherhand,therearespecificsinusesinwhich
inflammationhasbeenassociatedwithspecificsymptoms,suchasmaxillarysinusinflammation
associatedwithfacialpainandpressureandethmoidsinusinflammationassociatedwithsmell
loss.Thispathobiologicconsiderationwouldsuggestthatatleastthreefactorswouldbe
identifiedwiththecardinalCRSsymptoms.Wefoundthatthe37symptomsidentifiedfive
factors,andthesixcardinalCRSsymptomsloadedonthreefactors,bothincross-sectionaland
longitudinalEFAmodels.Thisstability,andcoherencetorealbiologicalprocessesmayprovide
someevidencethatthesefivefactorsmayhaveanunderlyingcommonpathobiology.
ThreeofthefivefactorswerecomposedofquestionsthatarecomponentsoftheEPOSs
criteriaforCRS,specificallythenasalcongestionanddischarge,facialpainandpressure,and
smelllossfactors,eachofwhichisoneofthecardinalEPOSsCRSsymptoms(Wjetal.,2012).As
thesesymptomsloadedondifferentfactors,itispossiblethattheunderlyingpathobiologymay
bedifferentfromoneanother.Whiletherewasclearevidencethattherewassome
longitudinalchangeinsymptomreporting,largetransitions(twoormorestepsontheLikert
scale)werenotverycommon.Thissuggeststhatmostpatientshavelongdurationsymptoms
andperhapsachronicpathobiologicprocess.Wehavepreviouslyreportedthatusingthe
cardinalsymptomstodefineEPOSsCRScategories(current,past,andnever)resultedinlarge
transitionsinpatientsmeetingcriteriaforthesecategoriesovertime;forexample,amongthe
subjectswhometEPOSscurrentCRSatbaseline,almosthalfdidnotmeetcriteriaforcurrent
CRSsixmonthslater(Sundaresan,2017,inPress).
62
InCRS,thesinusesareinflamedandswollen,andassuchwecanconsiderthesefactors
inthecontextofparanasalsinusopacification.Themaxillaryandethmoidsinuses,when
congestedorinflamed,canpresentsymptomsoffacialswellingandpain,whichwerepresentin
thefacialpainandpressurefactor(Waldetal.,1981).Sphenoidopacification,although
comparativelyrare,canbeassociatedwithsometimessevereheadaches(Sieskiewiczetal.,
2011).Frontalandethmoidsinusitiscanbeassociatedwithsmelllossandnasaldischarge,
consistentwiththesmelllossfactorwhichwasobserved(Chang,Lee,Mo,Lee,&Kim,2009).
Thesinussymptomsthattendedtoclusterinourfivefactorshavegenerallystrongsinus
opacificationcorrelates(Changetal.,2009;Sieskiewiczetal.,2011;Waldetal.,1981).Itis
possiblethatthefactorsthatwereidentifiedbytheEFAproceduremaybeassociatedwith
sinusopacification,ananalysisthatiscurrentlyunderway.Whilethefactoranalysiswas
performedinover3500subjects,sinusCTscanswereobtainedfrom646ofthesesubjects.
Furthermore,theearandeyesymptomsseentopredominateinfactor5maybemeasuring
allergypresenceandseverity.
Becauseofthefactorrotationmethodchosen,non-orthogonalitybetweenfactorswas
allowedasameanstoseparatesymptomsintodistinctfactorsinsofaraspossible.Thus,
examiningthecorrelationbetweenfactorsmayprovidesomeinsightintothesymptom
relationships.Thecongestionanddischargefactorwasmoderatelycorrelatedwithboththe
painandpressurefactor(𝜌 =0.67)aswellastheearandeyesymptomfactor(𝜌 =0.63,Figure
1).Thesecorrelationsarenotentirelyunexpectedasthereareseveralpathobiologiesthat
coulddrivethesepatterns;otherdriversconnectingreportingofdifferentsymptomscouldalso
beatwork.
63
Mostquestionnaireresponseswerewellrepresentedbytheobservedfactormodelas
measuredbythecommunalities,whichgiveaquantitativemeasureofthevariabilityofeach
questionexplainedbyourfinalfive-factormodel.IntheEFAofthebaselinequestionnaire,
mostoftheEPOSscorequestionshadcommonalityvaluesabove0.6,indicatingthatthemodel
accountedforamajorityofthevarianceinthereportingofthesesymptoms.Incontrast,
severalsymptomsdidnotloadheavilyontoanyfactorsinthebaselineEFA,andassuchalso
hadthelowestcommonalitiesof0.26(badbreath),0.37(coldsymptoms),and0.39(fatigue),
suggestingthatthemodelfailedtocapturemuchofthevariabilityinthosequestions.InanEFA
setting,wedonotnecessarilyexpectallcommunalitiestobehigh.Instead,theselow
communalityvaluesrevealthateitherthedriversofthesevariablesweredifferentthanthefive
observedfactors(highuniquevariance)orthatthesesymptomsweresubjecttohigher
measurementerrorthanothers.
Itiscommon,insocialandmedicalsciences,forfactoranalysestoincludedatafroma
singlepointintimeorinacontextwheretimeisunimportant(Browneetal.,2007).Inthis
study,wewereabletoincorporaterepeatedobservations,providinguswithnotonlythree
responsesforeachsymptomquestion,butalsoexplicitmeasuresofhowsymptomschanged
overtime.EFAtheoryhypothesizesthattherearerealunderlyingmechanisms,including
commonpathobiology,reportingphenomena,orotherreasonswhichmanifesteditselfinthe
clusteringofsymptomsintotheobservedfactors.Ifthishypothesiswascorrect,wewould
expectfactorcomposition(loadings)tobeinvarianttotime(i.e.noseasonality);andifthere
wassufficientvariationinsymptomreportingovertime,wewouldexpecttonotonlysee
factorspresentthemselvesacrosstime,butwewouldexpectthechangesinsymptomstodoso
64
accordingtothesesamefactors.Withoutsufficientsymptomchanging,wewouldlikelynot
havestrongenoughdifferencestoidentifythesesamefactors.Inthisstudy,wedidobservethe
samefactorsinEFAsofcross-sectionalresponsesaswellasinthedifferencesinreportingover
time,afindingconsistentwiththeideathatthesearerealconstructsdrivingtheobserved
symptoms.InthedifferenceEFAs,therewerealmostalwayslowercommonalitiescompared
withthecrosssectionalEFAs.Becauseresponsestoquestionnaireitemsovertimedidnot
evidencelargechanges(i.e.,mostchangescoresfellbetween-1and+1,andmanyat0),we
wouldexpectthecommonalitiesindifferencestobesmallerthanfortheircross-sectional
counterparts.Thesedifferencesincommunalitiesalsocouldbeduetothedifferentcorrelations
utilized,polychoric(impliedPearsoncorrelations)versusPearsoncorrelations.Inaddition,
measurementandreportingerrorwerelikelylargerinthedifferencemeasuresaswewere
combiningtogetherpotentiallytwo(notnecessarilyindependent)errorterms.
Thecommunalitiesinthedifferencescoreswereconsiderablylowerthanthoseinthe
crosssectionalEFAsrangingfrom0.09(fever)to0.75(smellloss)inthechangefrom6-month
to16-monthEFAandfrom0.26(badbreath)to0.95(smellloss)inthebaselineEFA.These
commonalitiesshowthatsmelllosswasaverypersistentsymptom.Thetimedurationinthe
twochangeEFAswerenotthesame;thefirstchangemeasurewasoversixmonthsandthe
secondwasover10months.Wewouldexpectcommonalitiestobelowerforthelonger
durationEFA,andwefoundthistobethecase.Themeancommunalityofthefirstchange
measure(0.392)wassignificantlylargerthanforthesecond(0.356,p-value=0.001),whichwas
inlinewithexpectations,suggestingthatthedifferencefrombaselineto6months(6month
65
duration)capturedvariabilitybetterthanthedifferencefrom6monthto16month
questionnaires(10monthduration).
ThemultidimensionalmeanfactorscoreswerecomparedbetweenthethreeEPOSsCRS
groups(current,past,never)usingMANOVA,logisticregression,andlinearregression:a
significantdifferencewasfound,indicatingadifferenceinfactorscoredistributionsbetween
theEPOSsCRSgroups(P-value<0.001).Forallfivefactors,themeanestimatedfactorscores
tendedtobehighestamongcurrentCRSsubjects,nexthighestforpastCRS,andlowestfor
neverCRS.ThisresultisnotinitselfsurprisingastheCRSgroupshereweredeterminedbythe
EPOSsdefinitionofthedisease,whichitselfisbasedonmanyofthesymptomsinthefactors.
However,theEFAincludedmanyquestionsbeyondthoseusedtodefineEPOSsCRSstatus.The
higherfactorscorescomprisedofeye,ear,asthma,constitutional,andheadachesymptoms
mayrepresentthecommonco-occurrenceofallergy,asthma,andheadachedisorders,for
example,amongCRSpatients.
WhiletherehasbeensomepriorworkonCRSfactorsatasinglepointintimewiththe
SNOT-20andSNOT-22questionnaires,therehasbeennopriorworkonCRSfactorsusing
longitudinalinformationonsymptoms(Browneetal.,2007).Previousstudieshaveexamined
thedecompositionofCRSandrelatedsymptomsusingtheSino-nasalOutcomeTest(SNOT)-20
and(SNOT)-22questionnairewhichmeasures“symptomsandsocial/emotionalconsequences
ofrhinosinusitis”througharangeofsymptomandhealth-relatedqualityoflifequestionsin20
or22Likert-scalequestions(DeConde,Bodner,Mace,&Smith,2014;C.Hopkinsetal.,2006).
Thesequestionsasktheparticipantstoconsiderphysical,functional,andemotionalsymptoms
theyhaveexperiencedintheprevious2-weekperiod(DeCondeetal.,2014;C.Hopkinsetal.,
66
2006).SNOTwasdesignedtoprovideasinglemeasureofpatientqualityoflifeandCRS-related
symptomseverity,implicitlysuggestingthateachquestionprovidesinformationregardinga
singleCRSconstructorfactor(Browneetal.,2007).OnestudyfoundquestionsfromtheSNOT-
22decomposedintofiveclearrhinologicsymptoms,extranasalrhinologicsymptoms,ear&
facialsymptoms,psychologicaldysfunction,andsleepdysfunctionfactors,notasinglefactoras
themissionoftheSNOTsurveysmaysuggest(DeCondeetal.,2014).Similarly,ananalysisof
SNOT-20revealedfourunderlyinglatentfactors,rhinologicalsymptoms,earandfacial
symptoms,sleepfunction,andpsychologicalfunction(Browneetal.,2007).Takentogether,
bothofthesestudiessuggestedthatquestionsetstypicallythoughtofmeasuringonlyCRS
symptomseverityorCRS-relatedqualityoflife,wereactuallymeasuringavarietyof
unobserveddimensions.Althoughthesetof37questionsutilizedinourstudyismuchdifferent
inscopeandaimthantheSNOTquestionnaires,theresultswereconsistentinrevealingseveral
factors.
LimitationsandFurtherWork:
Weobservedbothsimilaritiesanddifferencesbetweenpatientcharacteristicsinthe
subjectwhocompletedthebaselinequestionnairewhowereincludedandexcludedinthe
EFAs.SubjectsintheEFAanalysis,whoreturnedallthreequestionnaireswithoutexcessive
missingdata,weremorelikelytobewhite,morehighlyeducated,andwithhigherincomes.
Thismayhaveresultedinselectionbiasthatcouldhaveinfluencedtheresults.Thisproject
reliedextensivelyonquestionnairequestionresponses.Whiledirectandeasytointerpretor
compare,surveymethodologiessimilartothisencouragerespondentstoonlyreportquestions
67
thatwereaskedabout,potentiallymissingsymptomassociationsandrelationshipswithlatent
factors.Inaddition,theisthepotentialofsamesourcebiasimpactingresultsbywhichsome
individualsreportinasystemicmannernotnecessarilyassociatedwithsymptoms,suchas
someindividualsalwaysorneverreportingexperiencingsymptoms.Finally,whilewefound
strongevidenceofclusteringamong37symptomswithinvisitsandovertime,theultimate
utilityofthefindingswillbeincomparisontosinusopacification,whichawaitsfurtheranalysis
inasubsectoftheincludedsubjects.
ThisEFAgeneratedseveralhypotheses,mainlythattheunderlyingfactorsidentifiedby
theprocedurearemeasuringrealbiologicalphenomena,includingdistinctsinusopacification,
allergies,andasthmaseverity.Whileinteresting,studiesexaminingobjectivemeasuresofthe
presenceoftheseconditionsalongwiththesesymptomquestionsareneededtoprovide
substantiveevidenceofthisrelationship.
Conclusion
Inananalysisof37nasalandsinus,allergy,ear,asthma,headache,andconstitutional
symptoms,weidentifiedfiveunderlyingfactors–congestionanddischarge,painandpressure,
asthmaandconstitutional,earandeye,andsmellloss–thatwereconsistentinthreecross-
sectionalandtwolongitudinalchangeEFAs.Questionsassessedpresence,severity,bother,and
frequencyofall37symptoms.Themodelsgenerallyexplainedalargeproportionofthe
variationinthesesymptomswithinvisits,andsymptomslikesmelllossshowedmuch
persistenceacrossvisits.ThefindingshaveimplicationsforhowtoidentifypatientswithCRS
usingquestionnairesandmaysuggestsignificantmisclassificationinEPOSapproachestoCRS
68
identification.TheymayexplainwhypatientswhomeetEPOScriteriaforCRSoftendonothave
evidenceofsinusopacification.MoredirectevidenceawaitsanalysisofthesinusCTdataina
subsetofthesesubjects.
69
Tables&Figures
Table1.Demographicinformationofthe3535patientsincludedinthecurrentanalysisandthe4312patientswhoreturnedthebaselinequestionnairebutwerenotincludedinthecurrentanalysis.
Excluded IncludedMale,n(%) 1591(36.9) 1335(37.8)
Age,years,mean(SD) 53.2(16.8) 57.5(14.8)
SmokingstatusNever,n(%)Former,n(%)Current,n(%)
2253(52.2)1299(30.1)760(17.6)
2053(58.1)1100(31.1)382(10.8)
Income:<$25,000,n(%)$25,000-$50,000,n(%)>$50,000,n(%)
1599(37.1)1098(25.5)1083(25.1)
1021(28.9)970(27.4)1163(32.9)
Bodymassindex,kg/m2,mean(SD)
30.3(7.05) 30.0(6.93)
EducationlevelHighschool,n(%)Somecollege,n(%)Collegegraduate,n(%)
1608(37.3)1364(31.6)977(22.7)
1209(34.2)979(27.7)1171(33.1)
Race/ethnicityWhite,n(%)Black,n(%)Hispanic,n(%)
3372(87.5)264(6.1)276(6.4)
3323(94)78(2.2)134(3.8)
70
Table2.Questionsforthethreecross-sectionalquestionnaires.Questionresponseswereona5-itemLikertscale*
Item# Questiontext
Onaverage,howofteninthepast*monthshaveyouhad…
1 …blockageofyournasalpassages(nasalcongestion)?
2 …nasaldischargethatwasyelloworgreenincolor?
3 …post-nasaldrip?
4 …lossofsenseofsmell?
5 …facialpain?
6 …facialpressure?
Checktheboxthatdescribeshowofteneachproblemhashappenedinthepast†months,onaverage
7 …bothofmynasalpassageshaveblockage
8 …atleastoneofmynasalpassagesiscompletelyblocked
9 …Ihavebeenverybotheredby,myblockednasalpassage(s)
10 …Ihavealotofnasaldischarge
11 …Ihavetoblowmynosemorethan10timesadaybecauseofmynasaldischarge
12 …Ihavebeenverybotheredbymynasaldischarge
13 …IhavebeencoughingafterIeatorliedown
14 …Ihavehadmucusinmythroatthatfeltlikealumporblockage
15 …Ihavebeenverybotheredbymypost-nasaldrip
16 …Ihavenotbeenabletosmellanything
17 …Ihavebeenverybotheredbymylossofsenseofsmell
18 …Onascaleof0to10,myfacialpainhasbeenatleasta5(0=nopain,10=worstpain)
19 …Ihavebeenverybotheredbymyfacialpain
20 …Myfacialpressurehasbeensevere
21 …Ihavebeenverybotheredbymyfacialpressure
Checktheboxthatdescribeshowoften,onaverage,youhadthefollowinginthepast†months…
22 …headaches
23 …fevers
24 …coughing
25 …badbreath
26 …fatigue
27 …nasalitching
28 …sneezing
29 …eyeitching
30 …eyetearing
31 …earfullness
32 …earpain
33 …earpressure
34 …wheezing(breathingwithwhistlingsoundinchest)
35 …chesttightness
36 …shortnessofbreath
71
37 …cold/flusymptoms
*1=Never,2=Onceinawhile,3=Someofthetime,4=Mostofthetimeand5=Allthetime.†Forthebaselineand16-monthfollow-up=3months,forthe6-monthfollow-up=6months.
72
Table3.Factorloadingsandsymptomcommonaltiesfromtheexploratoryfactoranalysis(EFA)ofthe37presence,severity,andsecondaryCRSsymptomatbaseline.TheEFAwasfitusingordinaryleastsquaresandanobliminrotation(numberofpatients=3535).Loadingslessthan0.3wereomittedforreadability.Communalitiesrepresentthefractionofeachsymptom’svariabilitythatwascapturedbytheutilizedfivefactormodel.
# ItemLabel Factor1 Factor2 Factor3 Factor4 Factor5 Communalities1 Blockage 0.65 0.802 Dischargediscolored 0.49 0.613 PND 0.84 0.784 Smellloss 0.95 0.895 Facialpain 0.83 0.856 Facialpressure 0.76 0.877 Blockagebothsides 0.58 0.728 Blockagecomplete 0.55 0.739 Blockagebothered 0.61 0.8110 Dischargealot 0.86 0.8411 Blownose10xdaily 0.82 0.7612 Dischargebothered 0.84 0.8413 Coughliedown 0.72 0.7314 Lumpinthroat 0.69 0.7315 PNDbothered 0.84 0.8516 Smelllosscomplete 0.97 0.9517 Smelllossbothered 0.92 0.9118 Facialpain5+ 0.83 0.9019 Facialpainbothered 0.84 0.9120 Facialpressuresevere 0.77 0.8621 Facialpressure
bothered 0.78 0.90
22 Headaches 0.67 0.4823 Fever 0.43 0.3424 Coughing 0.46 0.5 0.5325 Badbreath 0.2626 Fatigue 0.3927 Nasalitching 0.56 0.5328 Sneezing 0.31 0.54 0.5129 Eyeitching 0.72 0.6230 Eyetearing 0.6 0.4931 Earfullness 0.35 0.54 0.6232 Earpain 0.51 0.49 0.6533 Earpressure 0.47 0.46 0.6334 Wheezing 0.8 0.6635 Chesttightness 0.85 0.7836 Shortnessofbreath 0.82 0.68
73
37 Cold/flusymptoms 0.44 0.37
74
Table4.Factorloadingsandsymptomcommonaltiesfromtheexploratoryfactoranalysis(EFA)ofthe37presence,severity,andsecondaryCRSsymptomchangesfrom6to16months.EFAwasfitusingordinaryleastsquaresandanobliminrotation(numberofpatients=3535).Loadingslessthan0.3wereomittedforreadability.Communalitiesrepresentthefractionofeachsymptom’svariabilitythatwascapturedbytheutilizedfivefactormodel.
# ItemLabel Factor1 Factor2 Factor3 Factor4 Factor5 Communalities1 Blockage 0.46 0.32 Dischargediscolored 0.32 0.193 PND 0.49 0.284 Smellloss 0.68 0.475 Facialpain 0.66 0.476 Facialpressure 0.59 0.417 Blockagebothsides 0.43 0.288 Blockagecomplete 0.34 0.239 Blockagebothered 0.52 0.410 Dischargealot 0.72 0.511 Blownose10xdaily 0.66 0.4312 Dischargebothered 0.75 0.5413 Coughliedown 0.34 0.33 0.2914 Lumpinthroat 0.36 0.2715 PNDbothered 0.57 0.416 Smelllosscomplete 0.84 0.6917 Smelllossbothered 0.68 0.4818 Facialpain5+ 0.78 0.619 Facialpainbothered 0.79 0.6320 Facialpressuresevere 0.65 0.4421 Facialpressure
bothered 0.72 0.54
22 Headaches 0.1423 Fever 0.124 Coughing 0.42 0.2925 Badbreath 0.1226 Fatigue 0.1427 Nasalitching 0.35 0.1928 Sneezing 0.38 0.2529 Eyeitching 0.5 0.2930 Eyetearing 0.51 0.331 Earfullness 0.64 0.4132 Earpain 0.53 0.3333 Earpressure 0.63 0.3934 Wheezing 0.58 0.3335 Chesttightness 0.64 0.4
75
36 Shortnessofbreath 0.6 0.3737 Cold/flusymptoms 0.35 0.24
76
Figure1.Lasagnaplotdisplayingtheproportionofindividualswitheachgivenresponsetothequestion“Onaverage,howofteninthepast3monthshaveyouhadpost-nasaldrip?”atbaselineand6monthsand16monthslater(1=Never,2=Onceinawhile,3=Someofthetime,4=Mostofthetime,5=Allthetime).Y-axisvaluesindicatethenumberofpatientswitheachparticularresponseatbaseline.
77
Figure2.Inter-factorcorrelationsatbaselinefromthebaselinequestionnaireexploratoryfactoranalysisfitviaordinaryleastsquaresandanobliminrotation.
78
Figure3.Factor1(congestionanddischarge)scoresbyCRSEPOSgroupsatbaseline.FactorscoresacrossfactorsandCRSEPOSsgroups(currentCRS,previousCRS,neverCRS)forthecongestionanddischargefactoratbaselinewithnumberofindividuals(N)ineachgroup.FactorscoreswereestimatedbytheItemResponseTheory(IRT)basedscoresmethod.X-axiswasjitteredtoimprovereadability.
79
Figure4.Continuousfactorscorescategorizedtoshowlongitudinalchangeacrossquestionnairesforfactor1(congestionanddischarge).Factorscoreswerecategorizedas:factorscore<-1wereassignedvaluesof-2;between-0.5and-1,assigned-1;between-0.5and0.5,assigned0;between0.5and1,assigned1;and>1,assigned2.Y-axislabelsindicatethenumberofpatientsatbaselineineachadjustedfactorscoregroup.FactorscoreswereestimatedbytheIRTmethod.
80
Appendix
81
FigureA1.Screeplotforthebaselinequestionnairedisplayingeigenvaluesonthey-axisandtheircorrespondingfactornumberonthex-axis.
82
FigureA2.Screeplotforthe6monthfollowupquestionnairedisplayingeigenvaluesonthey-axisandtheircorrespondingfactornumberonthex-axis.
83
FigureA3.Screeplotforthe16monthfollowupquestionnairedisplayingeigenvaluesonthey-axisandtheircorrespondingfactornumberonthex-axis.
84
FigureA4.Screeplotforthefirstdifference(baselineto6monthquestionnaires)displayingeigenvaluesonthey-axisandtheircorrespondingfactornumberonthex-axis.
85
FigureA5.Screeplotfortheseconddifference(6to16monthquestionnaires)displayingeigenvaluesonthey-axisandtheircorrespondingfactornumberonthex-axis.
86
TableA1.Factorloadingsandsymptomcommonaltiesfromtheexploratoryfactoranalysis(EFA)ofthe37presence,severity,andsecondaryCRSsymptomat6monthfollowup.TheEFAwasfitusingordinaryleastsquaresandanobliminrotation(numberofpatients=3535).Loadingslessthan0.3wereomittedforreadability.Communalitiesrepresentthefractionofeachsymptom’svariabilitythatwascapturedbytheutilizedfivefactormodel.
# ItemLabel Factor1 Factor2 Factor3 Factor4 Factor5 Communalities1 Blockage 0.51 0.65 0.652 Dischargediscolored 0.33 0.46 0.463 PND 0.76 0.65 0.654 Smellloss 0.98 0.9 0.95 Facialpain 0.89 0.84 0.846 Facialpressure 0.86 0.85 0.857 Blockagebothsides 0.5 0.61 0.618 Blockagecomplete 0.39 0.53 0.539 Blockagebothered 0.6 0.78 0.7810 Dischargealot 0.87 0.81 0.8111 Blownose10xdaily 0.82 0.7 0.712 Dischargebothered 0.85 0.8213 Coughliedown 0.53 0.34 0.6414 Lumpinthroat 0.55 0.6415 PNDbothered 0.79 0.7816 Smelllosscomplete 0.97 0.9417 Smelllossbothered 0.92 0.8918 Facialpain5+ 0.87 0.8819 Facialpainbothered 0.89 0.9120 Facialpressure
severe 0.83 0.84
21 Facialpressurebothered
0.85 0.88
22 Headaches 0.61 0.4923 Fever 0.39 0.3824 Coughing 0.4 0.5 0.5725 Badbreath 0.3326 Fatigue 0.4327 Nasalitching 0.48 0.51
87
28 Sneezing 0.37 0.47 0.5329 Eyeitching 0.68 0.6130 Eyetearing 0.6 0.531 Earfullness 0.71 0.6932 Earpain 0.31 0.64 0.6833 Earpressure 0.66 0.6934 Wheezing 0.81 0.6835 Chesttightness 0.87 0.8136 Shortnessofbreath 0.87 0.7337 Cold/flusymptoms 0.45 0.46
88
TableA2.Factorloadingsandsymptomcommonaltiesfromtheexploratoryfactoranalysis(EFA)ofthe37presence,severity,andsecondaryCRSsymptomat16monthfollowup.TheEFAwasfitusingordinaryleastsquaresandanobliminrotation(numberofpatients=3535).Loadingslessthan0.3wereomittedforreadability.Communalitiesrepresentthefractionofeachsymptom’svariabilitythatwascapturedbytheutilizedfivefactormodel.
# ItemLabel Factor1 Factor2 Factor3 Factor4 Factor5 Communalities1 Blockage 0.34 0.42 0.632 Dischargediscolored 0.3 0.483 PND 0.68 0.644 Smellloss 0.98 0.95 Facialpain 0.88 0.866 Facialpressure 0.88 0.867 Blockagebothsides 0.33 0.34 0.578 Blockagecomplete 0.34 0.38 0.579 Blockagebothered 0.4 0.43 0.7210 Dischargealot 0.84 0.7811 Blownose10xdaily 0.81 0.6712 Dischargebothered 0.85 0.813 Coughliedown 0.38 0.42 0.5714 Lumpinthroat 0.38 0.5715 PNDbothered 0.68 0.7216 Smelllosscomplete 0.99 0.9417 Smelllossbothered 0.93 0.918 Facialpain5+ 0.92 0.8919 Facialpainbothered 0.92 0.8920 Facialpressure
severe0.87 0.81
21 Facialpressurebothered
0.89 0.86
22 Headaches 0.58 0.4923 Fever 0.35 0.3524 Coughing 0.34 0.52 0.625 Badbreath 0.3526 Fatigue 0.3 0.4527 Nasalitching 0.46 0.5628 Sneezing 0.42 0.44 0.5629 Eyeitching 0.63 0.6130 Eyetearing 0.55 0.5231 Earfullness 0.59 0.6832 Earpain 0.41 0.5 0.68
89
33 Earpressure 0.34 0.54 0.6834 Wheezing 0.85 0.6935 Chesttightness 0.85 0.7536 Shortnessofbreath 0.89 0.7237 Cold/flusymptoms 0.46 0.49
90
TableA3.Factorloadingsandsymptomcommonaltiesfromtheexploratoryfactoranalysis(EFA)ofthe37presence,severity,andsecondaryCRSsymptomchangesfrombaselineto6months.EFAwasfitusingordinaryleastsquaresandanobliminrotation(numberofpatients=3535).Loadingslessthan0.3wereomittedforreadability.Communalitiesrepresentthefractionofeachsymptom’svariabilitythatwascapturedbytheutilizedfivefactormodel.
# ItemLabel Factor1 Factor2 Factor3 Factor4 Factor5 Communalities1 Blockage 0.68 0.492 Dischargediscolored 0.43 0.243 PND 0.7 0.454 Smellloss 0.63 0.425 Facialpain 0.67 0.516 Facialpressure 0.61 0.517 Blockagebothsides 0.58 0.398 Blockagecomplete 0.49 0.349 Blockagebothered 0.58 0.4610 Dischargealot 0.79 0.5711 Blownose10xdaily 0.73 0.5212 Dischargebothered 0.75 0.5713 Coughliedown 0.49 0.3414 Lumpinthroat 0.52 0.3715 PNDbothered 0.71 0.5516 Smelllosscomplete 0.87 0.7517 Smelllossbothered 0.71 0.5218 Facialpain5+ 0.79 0.6219 Facialpainbothered 0.83 0.6820 Facialpressure
severe 0.71 0.49
21 Facialpressurebothered
0.79 0.62
22 Headaches 0.1223 Fever 0.0924 Coughing 0.48 0.325 Badbreath 0.1326 Fatigue 0.1427 Nasalitching 0.1528 Sneezing 0.31 0.2229 Eyeitching 0.2230 Eyetearing 0.2131 Earfullness 0.64 0.43
91
32 Earpain 0.63 0.4133 Earpressure 0.74 0.5334 Wheezing 0.54 0.335 Chesttightness 0.59 0.3436 Shortnessofbreath 0.57 0.3337 Cold/flusymptoms 0.39 0.2
92
TableA4.Symptomcommonaltiesfromtheexploratoryfactoranalysis(EFA)ofthe37presence,severity,andsecondaryCRSsymptomchangesfrombaselineto6monthsand6monthsto16months.EFAwasfitusingordinaryleastsquaresandanobliminrotation(numberofpatients=3535).
# ItemLabel Baseline–6monthfollowup 6-16monthfollowup1 Blockage 0.49 0.32 Dischargediscolored 0.24 0.193 PND 0.45 0.284 Smellloss 0.42 0.475 Facialpain 0.51 0.476 Facialpressure 0.51 0.417 Blockagebothsides 0.39 0.288 Blockagecomplete 0.34 0.239 Blockagebothered 0.46 0.410 Dischargealot 0.57 0.511 Blownose10xdaily 0.52 0.4312 Dischargebothered 0.57 0.5413 Coughliedown 0.34 0.2914 Lumpinthroat 0.37 0.2715 PNDbothered 0.55 0.416 Smelllosscomplete 0.75 0.6917 Smelllossbothered 0.52 0.4818 Facialpain5+ 0.62 0.619 Facialpainbothered 0.68 0.6320 Facialpressuresevere 0.49 0.4421 Facialpressure
bothered0.62
0.5422 Headaches 0.12 0.1423 Fever 0.09 0.124 Coughing 0.3 0.2925 Badbreath 0.13 0.1226 Fatigue 0.14 0.1427 Nasalitching 0.15 0.1928 Sneezing 0.22 0.2529 Eyeitching 0.22 0.2930 Eyetearing 0.21 0.331 Earfullness 0.43 0.41
93
32 Earpain 0.41 0.3333 Earpressure 0.53 0.3934 Wheezing 0.3 0.3335 Chesttightness 0.34 0.436 Shortnessofbreath 0.33 0.3737 Cold/flusymptoms 0.2 0.24
94
Chapter4-Conclusion
Therearemanyusesfor,andmethodsof,conductingEFA.Inthisthesis,Ihave
proposedanewmethodtoidentifythenumberoffactorstoextract,studieditsperformancein
applicationtocertaindatastructures,andappliedEFAmodelselectionandfactorextraction
methodstoestimatelatentstructureinsymptomscommoninCRSanditsrelatedco-morbid
conditions.
TheproposedmethodfordeterminingthenumberoffactorstoextractduringanEFA
addstothevastliteratureaddressingtheproblemofestimating𝑚andhowtonavigatethis
situation.Thisnew𝑚-estimationprocedureperformedwellunderavarietyofsimulated
testingconditionswhichvariedwithregardtosamplesize(𝑁),datadimensionality(𝑃),and
strengthofcorrelationstructure.Thus,thismethodmaybeaviableandversatileoptionof
estimatingtheunderlyingfactormodelwhensamplesizeissufficientlylarge.
TheCRSsymptomEFAshedlightonthestudiedsymptoms,whichdecomposedintofive
interpretablefactors,generatingseveralhypothesizedbiologicalfactorunderpinnings.We
wereabletoidentifycongestionanddischarge,smellloss,earandeye,asthmaand
constitutional,andfacialpainandpressuresymptomfactors.Thesefactorsareconsistentwith
understandingofbiologyandpathologicalprocessesinindividualsinuses.
OurCRSstudyutilizedCattell’sscreetest(5,5,and5factors)andparallelanalysis(5,5,
and6factors)inordertodeterminethenumberoffactorstoextractforthebaseline,6-month
follow-up,and16-monthfollow-upquestionnaires.Interestingly,thesemethodsestimated
modestlydifferent𝑚comparedwiththeKaisereigenvaluegreaterthan1rule(K1;6,5,and7
factors)andquitedifferentmcomparedwithothercommonlyutilizedmethodsincludingthe
Bayesianinformationcriterion(BIC;15,16,and14factors)andsamplesizeadjustedBIC(SSBIC;
17,17,and20factors),forbaseline,6-month,and16-monthfollow-upquestionnaires,
respectively.Ournewlyproposedtracemethodalsoproducedanoptimalfactorcardinalityfar
removedfromthosepresentedintheCRSpaper(13,13,and16forbaseline,6-month,and16-
monthquestionnaires,respectively).InChapter2wehypothesizedthatthesedifferencesmay
beexplainedbydifferingstandardsoffitimplicatedbythedifferentlevelsofspecificity
95
(dimensionalityversusdistributionalform)addressedbythemethods’objectivecriteria.
Furtherresearchisneededtoelucidatethisconjecture.
Determiningwhichmethodtoutilizefor𝑚-estimationisdifficultforseveralreasons.
Firstly,therearealargenumberofpotentialoptionsforestimatingthenumberoffactorswith
potentiallydifferenttheoreticalfoundationsincludinglikelihood-basedmethods,eigenvalue-
basedmethods,graphicalmethods,andcross-validatedorbootstrapmethods.Investigators
mustfirstconsiderthepurposeoftheiranalysiswhendecidingwhichmethodtoutilize.If
interpretabilityorconcisenessisofparamountimportance,onemayconsidermethodsaligned
withthisideal.Otherwise,forexample,ifoneisplacingemphasisonidentifyingthenumberof
factorsinaFAmodelhypothesizedtoliterallyunderliethedata,methodsattunedtothatgoal
suchasBICorTRACEshouldbeconsidered.Thistargetdeterminationisimportant,asitwill
drivetheresultsandinferencedownstreamintheanalysis.Thisthesishasshownthatthe
choiceof𝑚cansubstantivelyimpactqualitativeandquantitativechangesinloadingandfactor
interpretations.Assuch,thischoicedirectlyinfluenceswhethertheresearcher’sdesiredgoalis
attainedwithrespecttounbiasedestimation,verisimilitude,generalizability,orinterpretability.
TheresultsarethusofhighimportancetoresearchersconductingEFAs.
Werecommendthatfutureworkfocusonthedecisionofwhichmethod(s)tousewhen
attemptingtofind𝑚inEFAsettings.Thebestprocessofchoosingwhichmethodofestimating
𝑚mayverywellbe,firstlyidentifyingwhatinterpretationof𝑚isrelevantforthecurrent
study,narrowingthefieldofpotentialmethods.Followingthis,apractitionerwilllikelystillbe
facedwithchoosingbetweenseveralmethodswhichmayperformdifferentlyinapplicationto
theobserveddata.Itisclearfromthesimulationstudythatundercertain,possiblyidentifiable
conditions,methodsmayoutperformorunderperformcomparedtotheiraverageefficacy
acrossconditions.Becausethestrengthofcorrelationsandsamplesizeofobserveddatawere
strongdriversoftheefficacyofcomparativemethods,theseattributesalongwithothers
shouldshedlightonwhichmethodismostappropriate.Thus,itmightbethatobserved
correlationmatrixattributescouldbeutilizedwithinasingleanalysistodeterminewhich
methodswouldperformbestandfutureworkinthisareaalsowouldbevaluable.Apractitioner
couldthenchoosebetweenmethodswithanunderstandingandanticipationofwhichmethods
96
maybemostappropriatefortheirspecificdataathand.Finally,agreementbetweenmethods
mayprovetobeevidencethattheagreedupon𝑚isdesirablecomparedtootherpossibilities.
Simulationstudiessuchastheonedescribedinthemethodsportionofthethesiscanaddress
thesequestionsforus,bytestinghypothesizedmethodsagainstaknowntruthwegenerate.
ThisthesiswasabletoidentifysimilarlatentstructureandfactoridentityinthreeCRS
symptomquestionnaireadministrations,aswellasthechangesinsymptomresponsescores
betweenadministrations.TheseEFAswereconsistentwiththehypothesisthathypothesized
biopathologicalphenomenaunderlaytheobservedsymptomresponses.However,objective
sinusinflammationdatamustbeincorporatedinordertoadequatelyassessthishypothesis.
ThetracemethodshowedpromiseasaviableadditionalmethodforEFAmodel
selection,outperformingmanycommonlyutilizedmethodsacrossseveralsimulation
conditions.However,inthediverserangeoffieldswhereEFAisutilized,thesimulated
scenariosweresmallinscope,asthenumberoffactorsassessedwasalwaysbetween5and10,
thenumberofvariablesutilizedwasbetween11and100,andthenumberofsimulated
sampleswasbetween100and1000.ThisthesisbringstolightalternativeapproachestoEFA
andEFAmodelselectionthatwehopewillproveusefulastheyarefurtherrefined.
97
ReferencesAkaike,H.(1973).MaximumlikelihoodidentificationofGaussianautoregressivemoving
averagemodels.Biometrika,255–265.
Akaike,H.(1987).FactoranalysisandAIC.Psychometrika,52(3),317–332.
https://doi.org/10.1007/BF02294359
Bai,J.,&Ng,S.(2002).DeterminingtheNumberofFactorsinApproximateFactorModels.
Econometrica,70(1),191–221.https://doi.org/10.1111/1468-0262.00273
Brown,T.A.(2014).Confirmatoryfactoranalysisforappliedresearch.GuilfordPublications.
Browne,J.P.,Hopkins,C.,Slack,R.,&Cano,S.J.(2007).TheSino-NasalOutcomeTest(SNOT):
Canwemakeitmoreclinicallymeaningful?Otolaryngology–headandNeckSurgery,
136(5),736–741.
Buuren,S.van,&Groothuis-Oudshoorn,K.(2011).mice:MultivariateImputationbyChained
EquationsinR.JournalofStatisticalSoftware,45(3),1–67.
Cattell,R.B.(1966).Thescreetestforthenumberoffactors.MultivariateBehavioralResearch,
1(2),245–276.
Chang,H.,Lee,H.J.,Mo,J.-H.,Lee,C.H.,&Kim,J.-W.(2009).Clinicalimplicationofthe
olfactorycleftinpatientswithchronicrhinosinusitisandolfactoryloss.Archivesof
Otolaryngology–Head&NeckSurgery,135(10),988–992.
Cole,M.,Schwartz,B.,&Bandeen-Roche,K.(2017).ExploratoryFactorAnalysisofCRS
Symptoms.
98
DeConde,A.S.,Bodner,T.E.,Mace,J.C.,&Smith,T.L.(2014).ResponseShiftinQualityofLife
AfterEndoscopicSinusSurgeryforChronicRhinosinusitis.JAMAOtolaryngology–Head&
NeckSurgery,140(8),712–719.https://doi.org/10.1001/jamaoto.2014.1045
Fabrigar,L.R.,Wegener,D.T.,MacCallum,R.C.,&Strahan,E.J.(1999).Evaluatingtheuseof
exploratoryfactoranalysisinpsychologicalresearch.PsychologicalMethods,4(3),272–
299.https://doi.org/10.1037/1082-989X.4.3.272
Ferguson,B.J.,Narita,M.,Yu,V.L.,Wagener,M.M.,&Gwaltney,J.M.(2012).Prospective
ObservationalStudyofChronicRhinosinusitis:EnvironmentalTriggersandAntibiotic
Implications.ClinicalInfectiousDiseases,54(1),62–68.
https://doi.org/10.1093/cid/cir747
Friedman,J.,Hastie,T.,&Tibshirani,R.(2001).Theelementsofstatisticallearning(Vol.1).
SpringerseriesinstatisticsSpringer,Berlin.
Hamilos,D.L.(2011).Chronicrhinosinusitis:Epidemiologyandmedicalmanagement.Journalof
AllergyandClinicalImmunology,128(4),693–707.
https://doi.org/10.1016/j.jaci.2011.08.004
Hirose,K.,Kawano,S.,Konishi,S.,&Ichikawa,M.(2011).Bayesianinformationcriterionand
selectionofthenumberoffactorsinfactoranalysismodels.JournalofDataScience,
9(2),243–259.
Hirsch,A.G.,Stewart,W.F.,Sundaresan,A.S.,Young,A.J.,Kennedy,T.L.,ScottGreene,J.,…
Schwartz,B.S.(2017).Nasalandsinussymptomsandchronicrhinosinusitisina
population-basedsample.Allergy,72(2),274–281.https://doi.org/10.1111/all.13042
99
Hopkins,C.,Browne,J.P.,Slack,R.,Lund,V.,&Brown,P.(2007).TheLund-Mackaystaging
systemforchronicrhinosinusitis:Howisitusedandwhatdoesitpredict?
Otolaryngology-HeadandNeckSurgery,137(4),555–561.
https://doi.org/10.1016/j.otohns.2007.02.004
Hopkins,C.,Browne,J.P.,Slack,R.,Lund,V.,Topham,J.,Reeves,B.,…vanderMeulen,J.
(2006).Thenationalcomparativeauditofsurgeryfornasalpolyposisandchronic
rhinosinusitis.ClinicalOtolaryngology:OfficialJournalofENT-UK ;OfficialJournalof
NetherlandsSocietyforOto-Rhino-Laryngology&Cervico-FacialSurgery,31(5),390–398.
https://doi.org/10.1111/j.1749-4486.2006.01275.x
Horn,J.L.(1965).Arationaleandtestforthenumberoffactorsinfactoranalysis.
Psychometrika,30(2),179–185.https://doi.org/10.1007/BF02289447
Humphreys,L.G.,&Jr,R.G.M.(1975).AnInvestigationoftheParallelAnalysisCriterionfor
DeterminingtheNumberofCommonFactors.MultivariateBehavioralResearch,10(2),
193–205.https://doi.org/10.1207/s15327906mbr1002_5
Kaiser,H.F.(1960).TheApplicationofElectronicComputerstoFactorAnalysis.Educationaland
PsychologicalMeasurement,20(1),141–151.
https://doi.org/10.1177/001316446002000116
Kamata,A.,&Bauer,D.J.(2008).Anoteontherelationbetweenfactoranalyticanditem
responsetheorymodels.StructuralEquationModeling,15(1),136–153.
Lee,C.-T.,Zhang,G.,&Edwards,M.C.(2012).OrdinaryLeastSquaresEstimationofParameters
inExploratoryFactorAnalysisWithOrdinalData.MultivariateBehavioralResearch,
47(2),314–339.https://doi.org/10.1080/00273171.2012.658340
100
Lopes,H.F.,&West,M.(2004).BayesianModelAssessmentinFactorAnalysis.StatisticaSinica,
14(1),41–67.
Myung,I.J.(2000).TheImportanceofComplexityinModelSelection.JournalofMathematical
Psychology,44(1),190–204.https://doi.org/10.1006/jmps.1999.1283
Norris,M.,&Lecavalier,L.(2010).EvaluatingtheUseofExploratoryFactorAnalysisin
DevelopmentalDisabilityPsychologicalResearch.JournalofAutismandDevelopmental
Disorders,40(1),8–20.https://doi.org/10.1007/s10803-009-0816-2
Owen,A.B.,&Wang,J.(2016).Bi-Cross-ValidationforFactorAnalysis.StatisticalScience,31(1),
119–139.https://doi.org/10.1214/15-STS539
Preacher,K.J.,Zhang,G.,Kim,C.,&Mels,G.(2013).Choosingtheoptimalnumberoffactorsin
exploratoryfactoranalysis:Amodelselectionperspective.MultivariateBehavioral
Research,48(1),28–56.
Press,S.J.,&Shigemasu,K.(1999).Anoteonchoosingthenumberoffactors.Communications
inStatistics-TheoryandMethods,28(7),1653–1670.
RCoreTeam.(2016).R:ALanguageandEnvironmentforStatisticalComputing.Vienna,Austria:
RFoundationforStatisticalComputing.Retrievedfromhttps://www.R-project.org/
Revelle,W.(2017).psych:ProceduresforPsychological,Psychometric,andPersonality
Research.Evanston,Illinois:NorthwesternUniversity.Retrievedfromhttps://CRAN.R-
project.org/package=psych
Schwarz,G.,&others.(1978).Estimatingthedimensionofamodel.TheAnnalsofStatistics,
6(2),461–464.
101
Sclove,S.L.(1987).Applicationofmodel-selectioncriteriatosomeproblemsinmultivariate
analysis.Psychometrika,52(3),333–343.
Sieskiewicz,A.,Lyson,T.,Olszewska,E.,Chlabicz,M.,Buonamassa,S.,&Rogowski,M.(2011).
Isolatedsphenoidsinuspathologies–theproblemofdelayeddiagnosis.MedicalScience
Monitor:InternationalMedicalJournalofExperimentalandClinicalResearch,17(3),
CR179.
Sundaresan,A.,Hirsch,A.,Young,A.,Tan,B.,Schleimer,R.,Kern,R.,…Schwartz,B.(2017).
LongitudinalEvaluationofChronicRhinosinusitisSymptomsinaPopulation-based
Sample.
Swihart,B.J.,Caffo,B.,James,B.D.,Strand,M.,Schwartz,B.S.,&Punjabi,N.M.(2010).
Lasagnaplots:asaucyalternativetospaghettiplots.Epidemiology(Cambridge,Mass.),
21(5),621.
Tan,B.K.,Kern,R.C.,Schleimer,R.P.,&Schwartz,B.S.(2013).ChronicRhinosinusitis:The
UnrecognizedEpidemic.AmericanJournalofRespiratoryandCriticalCareMedicine,
188(11),1275–1277.https://doi.org/10.1164/rccm.201308-1500ED
Timmerman,M.E.,&Lorenzo-Seva,U.(2011).Dimensionalityassessmentofordered
polytomousitemswithparallelanalysis.PsychologicalMethods,16(2),209–220.
https://doi.org/10.1037/a0023353
Tustin,A.W.,Hirsch,A.G.,Rasmussen,S.G.,Casey,J.A.,Bandeen-Roche,K.,&Schwartz,B.S.
(2017).AssociationsbetweenUnconventionalNaturalGasDevelopmentandNasaland
Sinus,MigraineHeadache,andFatigueSymptomsinPennsylvania.Environmental
HealthPerspectives,125(2),189–197.https://doi.org/10.1289/EHP281
102
Underwood,L.G.,&Teresi,J.A.(2002).Thedailyspiritualexperiencescale:development,
theoreticaldescription,reliability,exploratoryfactoranalysis,andpreliminaryconstruct
validityusinghealth-relateddata.AnnalsofBehavioralMedicine,24(1),22–33.
https://doi.org/10.1207/S15324796ABM2401_04
Velicer,W.F.,Eaton,C.A.,&Fava,J.L.(2000).ConstructExplicationthroughFactoror
ComponentAnalysis:AReviewandEvaluationofAlternativeProceduresfor
DeterminingtheNumberofFactorsorComponents.InR.D.Goffin&E.Helmes(Eds.),
ProblemsandSolutionsinHumanAssessment(pp.41–71).SpringerUS.
https://doi.org/10.1007/978-1-4615-4397-8_3
Wald,E.R.,Milmoe,G.J.,Bowen,A.,Ledesma-Medina,J.,Salamon,N.,&Bluestone,C.D.
(1981).Acutemaxillarysinusitisinchildren.NewEnglandJournalofMedicine,304(13),
749–754.
Wj,F.,Vj,L.,J,M.,C,B.,I,A.,F,B.,…Pj,W.(2012a).EPOS2012:Europeanpositionpaperon
rhinosinusitisandnasalpolyps2012.Asummaryforotorhinolaryngologists.Rhinology,
50(1),1–12.https://doi.org/10.4193/Rhino50E2
Wj,F.,Vj,L.,J,M.,C,B.,I,A.,F,B.,…Pj,W.(2012b).EuropeanPositionPaperonRhinosinusitis
andNasalPolyps2012.Rhinology.Supplement,(23),3pprecedingtableofcontents,1-
298.
Zwick,W.,&Vejicer,W.(1984).AComparisonofFiveMethodsforDeterminingtheNumberof
ComponentsinDataSets.PsychologicalBulletin,99(3).
103
BiographyMatthewK.Colewasbornin1993intheUSA.
MattcompletedhisundergraduateworkatSacredHeartUniversityinFairfield,
Connecticut,wherehemajoredinBiologyandMathematics.Duringhisundergraduate
education,hespentsometimestudyingabroadinItalyandGermanyandspenthisother
summersresearchingtheecologyoftheAmericanHorseshoeCrabLimuluspolyphemusinLong
IslandSound.
In2015,MattbeganhisSc.M.atJohnsHopkinsUniversity.Hewasateachingassistant
fortheStatisticalMethodsinPublicHealthcoursesequence.