SAMS: Data and Text Mining for Early Detection of...
Transcript of SAMS: Data and Text Mining for Early Detection of...
SAMS:DataandTextMiningforEarlyDetectionofAlzheimer’sDiseaseNovember,2016DrChristopherBull
Aimoftalk
• WhatisSAMS• DataCapture
– Problemsandsolutionstoacquiringthistypeoftext/data• NLP
– Toolsused• Existing• Bespoke
• Reflections
WhoamI?
DrChristopherBull
[email protected]@ChrisBull88
[Insertdashingphotohere]
• 2011– PhD• 2014– SAMS(PDRA)• 2016– MobileAge(PDRA)------------------------------------------• SoftwareEngineering• Education/Pedagogy• DigitalHealthTechnologies
SAMSOverview
Problem
• NationalDementiaStrategy(2009):early(‘timely’)diagnosis
• Onlyabout50%ofpeoplewithdementiacurrentlyreceiveadiagnosis
• Diagnosisisoftenlate- moderateorseverestages
WhatisAlzheimer’sDisease?
• Alzheimer’sisthemostcommoncauseofdementia(estimated60%-80%ofcases)– Dementia“describessymptomsthatoccurwhenthebrainisaffectedby
certaindiseasesorconditions”• Symptomsinclude:
– memoryloss– difficultieswith:
• thinking• problem-solving• language
• UltimatelyfatalSource:Alzheimer’sSociety
SAMS
Goal:ExploreTechnology-dependentproxymarkersOfAlzheimer’sDisease
Aims:• Nonintrusivecaptureofcomputeruse• Minethedatafortrendsandpatterns• Inferlongitudinalchangesincognitivehealth
Team
ProfessorPeteSawyer SchoolofComputingandCommunications,LancasterUniversity
DrPaulRayson SchoolofComputingandCommunications,LancasterUniversity
DrChristopherBull SchoolofComputingandCommunications,LancasterUniversity
ProfessorAlistairSutcliffe SchoolofComputingandCommunications,LancasterUniversity
ProfessorAlistairBurns NationalClinicalDirectorforDementiainEngland,InstituteofBrain,BehaviourandMentalHealth,UniversityofManchester
DrIracema Leroi InstituteofBrain,BehaviourandMentalHealth,UniversityofManchester
GemmaStringer InstituteofBrain,BehaviourandMentalHealth,UniversityofManchester
DrSamuelCouth InstituteofBrain,BehaviourandMentalHealth,UniversityofManchester
ProfessorJohnKeane SchoolofComputerScience,UniversityofManchester
DrAnnGledson SchoolofComputerScience,UniversityofManchester
ProfessorCliveBallard WolfsonCentreforAge-RelatedDiseases,King'sCollegeLondon
DataFlows
CurrentStatus
• ProjectfundingendedSeptember2016
• On-goinganalysis
MyRoleinSAMS
…andDataCollection
MyRole
• Datacapturesoftware– SoftwareDesign/implementation
• SAMSManager• Browserextensions
– Maintenance(obviously)• TextMining
– Textextraction(reconstruction)– ReusingexistingNLPpipeline(Wmatrix;UCREL)– Implementingextensionstopipelineforspecificheuristics
• GeneralProjectSupport(Team&Participants)• Considerchallenges
Challenges
• Volatilityofparticipantcomputers– Unexpectedupdates– Varyingshutdownprocedures– Varioussoftwaresetups(anti-virusetc.)
• Weakperformingcomputers(andnotmonopolisevaluableresources)– Again,varioushardware/softwaresetups
• Ethicalchallenges– Privacy/Security
• Novelmonitoringapproaches• InternetExplorer*sigh*• Win10roll-outmidprojectà
AbstractArchitecture(DataCollection)
BrowserExtensions
Desktop/ApplicationMonitorProcesses
EncryptLogs
SecureSAMSServer
ManagerProcess
Collectingcontext,notjustrawdata
Desktop/ApplicationMonitorProcesses
u C#inputeventlisteners
u VarietyofMouse,keyboard.
u WindowsAutomationAPI:UIAutomation(UIA)
u ObserveUIelements(andproperties)auserinteractswith.
u Providescontextbehindevents.
Desktop/AppMonitor
*WorkofDrAnnGledson,Mancs
BrowserExtensionsBrowserExtension
Webpageblack/whitelist(e.g.nohttps://unlesspredefined)
JSDOMparsing(textfields andinteractiveelements)
JSeventlisteners&contextidentifier(Click,Mouse-Move,Focusetc.)
Logmessagecaching(volatile)
Encryption
Writelogfiles
BrowserMonitoring- Challenges
• Contexttoevents
• ConstantlychangingordynamicDOM
Manager/Uploader
• Processmanagement
• Servercommunication
• Remoteupdating
• Logmessagecachingandencryption
Manager(2)
EarlyUI
ProjectSupport
• ParticipantStatusChecker– Forclinical&Techteams– +Androidapp
• Phonesupport– ClinicalTeam– Participants
• Participantvisits(Installs)
ExistingStudy(s)
NunStudy:• Measures
obtainedfromautobiographies
• writtenovera60-yearspan(age22to83).
Nodementia Dementia
Grammaticalcomplexity
-mean4.78-declined.04unitsperyear
-mean3.86-declined.03unitsperyear.
Ideadensity -mean5.35propositionsper10words- declined.03unitsperyear
-mean 4.34propositionsper10words-declined.02unitsperyear.
PropositionalIdeaDensity(P-density)
• “Ideadensity[…]isthenumberofexpressedpropositionsdividedbythenumberofwords.Intermsofsemantics,ideadensityisameasureoftheextenttowhichthespeakerismakingassertions(oraskingquestions)ratherthanjustreferringtoentities”– “Automaticmeasurementofpropositionalideadensityfrompart-
of-speechtagging”(Brownetal,2008)• ExistingImplementation
– CPIDR(ComputerizedPropositionalIdeaDensityRater)– (pronounced“spider”)– onlytooltoautomatethis*
*AttimeofstartingSAMS
Kusari (Toolchainmanager)
“ToolchainanddatadependencymanagerforusewithconventionalNLPtoolchains”
DrSteveWattamhttps://delta.lancs.ac.uk/Steve/kusarihttps://delta.lancs.ac.uk/Steve/kusari-links
Toolchain
SpellingVariation VARDucrel.lancs.ac.uk/vard/Java
PartOfSpeechTagger CLAWSucrel.lancs.ac.uk/claws/C
SemanticTagger USASucrel.lancs.ac.uk/usas/C
FrequencyLists Tmatrixucrel.lancs.ac.uk/wmatrix/C
SAMSsoftware SNOWCATdelta.lancs.ac.uk/SAMS/SNOWCATJava
SNOWCAT
Sams aNalysis ofOutputfromWmatrix fortheCognitiveAssessmentofText
• Input– Tmatrix (FQLs)– USAS(Sem)
• Output– CSVofmetrics
SNOWCAT:SampleOutput(1/2)
• TotalWords(MWE), 26278• TotalWords, 27787• Vocabularysize(MWE), 3533• Vocabularysize, 3444• Type:Token (ratio;MWE), 0.134• Type:Token (ratio), 0.124• Type:Token (normalisedratio), 0.403• Wordsoccurringonce(MWE), 1842• Adjective(total;MWE), 1288• Adjective(ratio;MWE), 0.049• Noun(total;MWE), 4280• Noun(ratio;MWE), 0.163• …
SNOWCAT:SampleOutput(2/2)
• Pronoun(total;MWE), 2672• Pronoun(ratio;MWE), 0.102• Verb(total;MWE), 6135• Verb(ratio;MWE), 0.233• Contentwords(total;MWE), 13757• Contentwords(ratio;MWE), 0.524• Fillerwords(total;MWE), 183• Fillerwords(ratio;MWE), 0.007• Noun:Verb (ratio;MWE), 0.698• MeanLengthofUtterance, 27.653• VARDVariant(total), 69• VARDVariant(ratio), 0.003• PropositionalIdeaDensity, 0.565
Early(unpublished)Results
• ValidateP-Density(comparisontoCPIDRtool)
• UsesnoveliststudytoexploreusefulnessofSNOWCATmetrics
• [Showspreadsheetofearly(unpublished)results]
Charts
What’snext?
• ContinueNLPanalysis
• CorrelateDataandTextMininganalyses
• …SAMS2.0
LessonsLearnt
• Ethicalprocess– Affectsfundamentaldesigndecisions
• Complexityofdatacollectionoutsideof“labsetting”
• Validatingotherstudies/claimsimportant
Publications
ucrel.lancs.ac.uk/sams/papers.php• Combiningdataminingandtextminingfordetectionofearlystagedementia:the
SAMSframework.Bull,C.,Asfiandy,D.,Gledson,A.,Mellor,J.,Couth,S.,Stringer,G.,Rayson,P.,Sutcliffe,A.,Keane,J.,Zeng,X.,Burns,A.,Leroi,I.,Ballard,C.,&Sawyer,P.(2016).In LREC-2016Workshop: RaPID-2016 [proceedings; slides]
• FromClicktoCognition:Detectingcognitivedeclinethroughdailycomputeruse.Stringer,G.,Sawyer,P.,Sutcliffe,A.,&Leroi,I.(2015).InD.Bruno(Ed.), ThePreservationofMemory:TheoryandPracticeforClinicalandNon-ClinicalPopulations (pp.93-103).Hove,UK:PsychologyPress.[onlinepreview]
• DementiaandSocialSustainability:ChallengesforSoftwareEngineering.Sawyer,P.,Sutcliffe,A.,Rayson,P.,& Bull,C. (2015).In 37thInternationalConferenceonSoftwareEngineering(ICSE'15) (pp.527-530).Florence,Italy:IEEE.DOI: 10.1109/ICSE.2015.188
• Discoveringaffect-ladenrequirementstoachievesystemacceptance.Sutcliffe,A.,Rayson,P., Bull,C.,&Sawyer,P.(2014).In 22ndIEEEInternationalRequirementsEngineeringConference(RE'14). (pp.173-182).IEEE.DOI: 10.1109/RE.2014.6912259