Topic Models for Semantic Memory and Information Extraction Mark Steyvers Department of Cognitive...
-
date post
20-Jan-2016 -
Category
Documents
-
view
219 -
download
0
Transcript of Topic Models for Semantic Memory and Information Extraction Mark Steyvers Department of Cognitive...
Topic Models for Semantic Memory and Information Extraction
Mark SteyversDepartment of Cognitive Sciences
University of California, Irvine
Joint work with:Tom Griffiths, UC BerkeleyPadhraic Smyth, UC IrvineDave Newman, UC IrvineChaitanya Chemudugunta, UC Irvine
Human MemoryInformation
Retrieval
Finding relevant memories
Extracting meaning
Finding relevant documents
Extracting content
Semantic Representations
• What are suitable representations?
• How can it be learned from experience?
MONEY
• Structured representations • Encoding of propositions• Hand coded examples• Not inferred from data
CASH
BILL
LOAN
MONEY
BANK
RIVER STREAM
Two approaches to semantic representation
Semantic networkse.g. Collins & Quillian 1969
Semantic Spacese.g. Landauer & Dumais, 1997
LOANCASH
RIVER
BANK
BILL
• Relatively unstructured representations• Can be learned from data
Overview
I Probabilistic Topic Models
II Associative Memory
III Episodic Memory
IV Applications
V Conclusion
Probabilistic Topic Models
• Extract topics from large text collections
unsupervised
Bayesian statistical techniques
• Topics provide quick summary of content / gist
What are topics?
• A topic represents a probability distribution over words
– Related words get high probability in same topic
• Example topics extracted from psychology grant applications:
brainfmri
imagingfunctional
mrisubjects
magneticresonance
neuroimagingstructural
schizophreniapatientsdeficits
schizophrenicpsychosissubjects
psychoticdysfunction
abnormalitiesclinical
memoryworking
memoriestasks
retrievalencodingcognitive
processingrecognition
performance
diseasead
alzheimerdiabetes
cardiovascularinsulin
vascularblood
clinicalindividuals
Probability distribution over words. Most likely words listed at the top
Model Input
• Matrix of counts: number of times words occur in documents
• Note:
– word order is lost: “bag of words” approach
– Some function words are deleted: “the”, “a”, “in”
documents
wor
ds
1
…
16
…
0
…
FOOD
…
6190ITALIAN
2012PASTA
3034PIZZA
Doc3 … Doc2Doc1
Model Assumptions
• Each topic is a probability distribution over words
• Each document is modeled as a mixture of topics
Generative Model for Documents
TOPICS MIXTURE
TOPIC TOPIC
WORD WORD
...
...
1. for each document, choosea mixture of topics
2. sample a topic [1..T] from the mixture
3. sample a word from the topic
(“LDA” by Blei, Ng, & Jordan, 2002; “pLSI” by Hoffman, 1999; Griffith & Steyvers, 2002 & 2004)(“LDA” by Blei, Ng, & Jordan, 2002; “pLSI” by Hoffman, 1999; Griffith & Steyvers, 2002 & 2004)
Document = mixture of topics
brainfmri
imagingfunctional
mrisubjects
magneticresonance
neuroimagingstructural
schizophreniapatientsdeficits
schizophrenicpsychosissubjects
psychoticdysfunction
abnormalitiesclinical
memoryworking
memoriestasks
retrievalencodingcognitive
processingrecognition
performance
diseasead
alzheimerdiabetes
cardiovascularinsulin
vascularblood
clinicalindividuals
Document
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
80%20%
Document
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
100%
The Generative Model
• Conditional probability of word in document:
1
| | |T
j
P w d P w z j P z j d
word probability in topic j
probability of topic jin document
Inverting the generative model
• Generative model gives procedure to obtain corpus from
topics and mixing proportions
• Inverting the model involves extracting topics and mixing
proportions per document from corpus
Inverting the generative model
• Estimate topic assignments
– Each occurrence of a word is assigned to one topic [ 1..T ]
• Large state space
– With T=300 topics and 6,000,000 words, the size of the
discrete state space is (300)6,000,000
• Need efficient sampling techniques
– Markov Chain Monte Carlo (MCMC) with Gibbs sampling
INPUT: word-document counts (word order is irrelevant)
OUTPUT:topic assignments to each word P( zi )likely words in each topic P( w | z )likely topics in each document (“gist”) P( z | d )
Summary
THEORYSCIENTISTS
EXPERIMENTOBSERVATIONS
SCIENTIFICEXPERIMENTSHYPOTHESIS
EXPLAINSCIENTISTOBSERVED
EXPLANATIONBASED
OBSERVATIONIDEA
EVIDENCETHEORIESBELIEVED
DISCOVEREDOBSERVE
FACTS
SPACEEARTHMOON
PLANETROCKET
MARSORBIT
ASTRONAUTSFIRST
SPACECRAFTJUPITER
SATELLITESATELLITES
ATMOSPHERESPACESHIPSURFACE
SCIENTISTSASTRONAUT
SATURNMILES
ARTPAINT
ARTISTPAINTINGPAINTEDARTISTSMUSEUM
WORKPAINTINGS
STYLEPICTURES
WORKSOWN
SCULPTUREPAINTER
ARTSBEAUTIFUL
DESIGNSPORTRAITPAINTERS
STUDENTSTEACHERSTUDENT
TEACHERSTEACHING
CLASSCLASSROOM
SCHOOLLEARNING
PUPILSCONTENT
INSTRUCTIONTAUGHTGROUPGRADE
SHOULDGRADESCLASSES
PUPILGIVEN
BRAINNERVESENSE
SENSESARE
NERVOUSNERVES
BODYSMELLTASTETOUCH
MESSAGESIMPULSES
CORDORGANSSPINALFIBERS
SENSORYPAIN
IS
CURRENTELECTRICITY
ELECTRICCIRCUIT
ISELECTRICAL
VOLTAGEFLOW
BATTERYWIRE
WIRESSWITCH
CONNECTEDELECTRONSRESISTANCE
POWERCONDUCTORS
CIRCUITSTUBE
NEGATIVE
A selection from 500 topics [P(w|z = j)]
FIELDMAGNETICMAGNET
WIRENEEDLE
CURRENTCOIL
POLESIRON
COMPASSLINESCORE
ELECTRICDIRECTION
FORCEMAGNETS
BEMAGNETISM
POLEINDUCED
SCIENCESTUDY
SCIENTISTSSCIENTIFIC
KNOWLEDGEWORK
RESEARCHCHEMISTRY
TECHNOLOGYMANY
MATHEMATICSBIOLOGY
FIELDPHYSICS
LABORATORYSTUDIESWORLD
SCIENTISTSTUDYINGSCIENCES
BALLGAMETEAM
FOOTBALLBASEBALLPLAYERS
PLAYFIELD
PLAYERBASKETBALL
COACHPLAYEDPLAYING
HITTENNISTEAMSGAMESSPORTS
BATTERRY
JOBWORKJOBS
CAREEREXPERIENCE
EMPLOYMENTOPPORTUNITIES
WORKINGTRAINING
SKILLSCAREERS
POSITIONSFIND
POSITIONFIELD
OCCUPATIONSREQUIRE
OPPORTUNITYEARNABLE
Words can have high probability in multiple topics
Disambiguation
• Give example for field
Topics versus LSA
• Latent Semantic Analysis (LSI/LSA)– Projects words into a K-dimensional hidden space– Less interpretable– Not generalizable– Not as accurate
MONEY
LOANCASH
RIVER
BANK
BILL
(high dimensional space)
Modeling Word Association
Word Association(norms from Nelson et al. 1998)
CUE: PLANET
people
EARTH STARS SPACE
SUN MARS
UNIVERSE SATURN GALAXY
associate number
12345678
Word Association(norms from Nelson et al. 1998)
CUE: PLANET
(vocabulary = 5000+ words)
Topic model for Word Association
• Association as a problem of prediction:
– Given that a single word is observed, predict what other
words might occur in that context
• Under a single topic assumption:
2 1 2 1| | |z
P w w P w z P z w
Response Cue
z
wzPzwPwwP )|()|()|( 1212
people
EARTH STARS SPACE
SUN MARS
UNIVERSE SATURN GALAXY
model
STARS STAR SUN
EARTH SPACE
SKY PLANET
UNIVERSE
associate number
12345678
Word Association(norms from Nelson et al. 1998)
CUE: PLANET
First associate “EARTH” is in the set of 8 associates (from the model)
P( set contains first associate )
100
101
102
103
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1P
(se
t co
nta
ins
first
ass
oci
ate
)
Set size
LSATOPICS
LSA
TOPICS
Why would LSA perform worse?
• Cosine similarity measure imposes unnecessary
constraints on representations, e.g.
– Symmetry
– Triangle inequality
Violation of triangle inequality
Can find associations that violate this:
A
B C
AC AB + BC
SOCCER FIELD MAGNETIC
No Triangle Inequality with Topics
SOCCER
MAGNETICFIELD
TOPIC 1 TOPIC 2
Topic structure easily explains violations of triangle inequality
Modeling Episodic Memory
Semantic Isolation Effect
Study this list:PEAS, CARROTS, BEANS, SPINACH, LETTUCE, HAMMER, TOMATOES, CORN, CABBAGE, SQUASH
HAMMER,PEAS,
CARROTS,...
Semantic Isolation Effect
• Verbal explanations:
– Attention, surprise, distinctiveness
• Our approach: memory system is trading off two encoding
resources
– Storing specific words (e.g. “hammer”)
– Storing general theme of list (e.g. “vegetables”)
Computational Problem
• How to tradeoff specificity and generality?
– Remembering detail and gist
Special word topic model
Special word topic (SW) model
• Each word can be generated via one route
– Topics
– Special words distribution (unique to a document)
• Conditional prob. of a word under a document:
| ( 0 | ) ( | )
( 1 | ) ( | )
topics
special
P w d P x d P w d
P x d P w d
Study wordsPEAS
CARROTS BEANS
SPINACH LETTUCE HAMMER
TOMATOES CORN
CABBAGE SQUASH
Retrieval Probability
0.00 0.01 0.02
HAMMER
BEANS
CORN
PEAS
SPINACH
CABBAGE
LETTUCE
CARROTS
SQUASH
TOMATOES
Special Word Probability
0.00 0.05 0.10 0.15
HAMMER
PEAS
SPINACH
CABBAGE
CARROTS
LETTUCE
SQUASH
BEANS
TOMATOES
CORN
Switch Probability
0.0 0.5 1.0
Verbatim
Topic
Topic Probability
0.0 0.1 0.2
VEGETABLES
FURNITURE
TOOLS
ENCODING RETRIEVAL
Special
Hunt & Lamb (2001 exp. 1)
OUTLIER LIST
PEAS
CARROTS
BEANS
SPINACH
LETTUCE
HAMMER
TOMATOES
CORN
CABBAGE
SQUASH
CONTROL LIST
SAW
SCREW
CHISEL
DRILL
SANDPAPER
HAMMER
NAILS
BENCH
RULER
ANVIL
outlier list pure listP
rob.
of
Rec
all
0.0
0.2
0.4
0.6
0.8
1.0TargetBackground
DATA
Model Predictions
outlier list pure list
Pro
b. o
f R
ecal
l
0.0
0.2
0.4
0.6
0.8
1.0TargetBackground
DATA PREDICTED
outlier list pure list
Nee
d P
roba
bilit
y0.00
0.01
0.02
0.03
0.04
0.05TargetBackground
False memory effects
MADFEARHATE
SMOOTHNAVYHEAT
SALADTUNE
COURTSCANDYPALACEPLUSHTOOTHBLIND
WINTER
Number of Associates
3 6 9
MADFEARHATERAGE
TEMPERFURY
SALADTUNE
COURTSCANDYPALACEPLUSHTOOTHBLIND
WINTER
MADFEARHATERAGE
TEMPERFURY
WRATHHAPPYFIGHT
CANDYPALACEPLUSHTOOTHBLIND
WINTER
(lure = ANGER)
Number of Associates Studied
3 6 9 12 15
Pro
b. o
f R
ecal
l
0.0
0.2
0.4
0.6
0.8
1.0Studied itemsNonstudied (lure)
DATA
Robinson & Roediger (1997)
Number of Associates Studied
3 6 9 12 15
Pro
b. o
f R
etri
eval
0.00
0.01
0.02
0.03Studied associatesNonstudied (lure)
PREDICTED
Relation to Dual Process Models
• Gist/Verbatim distinction (e.g. Reyna, Brainerd, 1995)
– Maps onto topics and special words
• Our approach specifies both encoding and retrieval
representations and processes
– Routes are not independent
– Model explains performance for actual word lists
Applications I
Topics provide quick summary of content
• What is in this corpus?
• What is in this document?
• What are the topical trends over time?
• Who writes on this topic?
330,000 articles
2000-2002
Analyzing the New York Times
Three investigations began Thursday into the securities and exchange_commission's choice of william_webster to head a new board overseeing the accounting profession. house and senate_democrats called for the resignations of both judge_webster and harvey_pitt, the commission's chairman. The white_house expressed support for judge_webster as well as for harvey_pitt, who was harshly criticized Thursday for failing to inform other commissioners before they approved the choice of judge_webster that he had led the audit committee of a company facing fraud accusations. “The president still has confidence in harvey_pitt,” said dan_bartlett, bush's communications director …
Extracted Named Entities
• Used standard algorithms to extract named entities:
- People- Places- Organizations
Standard Topic Model with Entities
team 0.028 tour 0.039 holiday 0.071 award 0.026play 0.015 rider 0.029 gift 0.050 film 0.020game 0.013 riding 0.017 toy 0.023 actor 0.020season 0.012 bike 0.016 season 0.019 nomination 0.019final 0.011 team 0.016 doll 0.014 movie 0.015games 0.011 stage 0.014 tree 0.011 actress 0.011point 0.011 race 0.013 present 0.008 won 0.011series 0.011 won 0.012 giving 0.008 director 0.010player 0.010 bicycle 0.010 special 0.007 nominated 0.010coach 0.009 road 0.009 shopping 0.007 supporting 0.010playoff 0.009 hour 0.009 family 0.007 winner 0.008championship 0.007 scooter 0.008 celebration 0.007 picture 0.008playing 0.006 mountain 0.008 card 0.007 performance 0.007win 0.006 place 0.008 tradition 0.006 nominees 0.007LAKERS 0.062 LANCE-ARMSTRONG 0.021 CHRISTMAS 0.058 OSCAR 0.035SHAQUILLE-O-NEAL0.028 FRANCE 0.011 THANKSGIVING 0.018 ACADEMY 0.020KOBE-BRYANT 0.028 JAN-ULLRICH 0.003 SANTA-CLAUS 0.009 HOLLYWOOD 0.009PHIL-JACKSON 0.019 LANCE 0.003 BARBIE 0.004 DENZEL-WASHINGTON 0.006NBA 0.013 U-S-POSTAL-SERVICE 0.002 HANUKKAH 0.003 JULIA-ROBERT 0.005SACRAMENTO 0.007 MARCO-PANTANI 0.002 MATTEL 0.003 RUSSELL-CROWE 0.005RICK-FOX 0.007 PARIS 0.002 GRINCH 0.003 TOM-HANK 0.005PORTLAND 0.006 ALPS 0.002 HALLMARK 0.002 STEVEN-SODERBERGH 0.004ROBERT-HORRY 0.006 PYRENEES 0.001 EASTER 0.002 ERIN-BROCKOVICH 0.003DEREK-FISHER 0.006 SPAIN 0.001 HASBRO 0.002 KEVIN-SPACEY 0.003
Basketball Holidays OscarsTour de France
computer 0.069 play 0.030technology 0.026 show 0.029system 0.015 stage 0.022digital 0.014 theater 0.022chip 0.013 director 0.017software 0.013 production 0.017machine 0.011 performance 0.016devices 0.010 dance 0.014machines 0.010 audience 0.014video 0.009 festival 0.013Companies 1.000 Theater 0.960
Music 0.040
IBM 0.074 BROADWAY 0.119 BACH 0.035APPLE 0.061 NEW_YORK 0.044 BEETHOVEN 0.026INTEL 0.059 SHAKESPEARE 0.029 LOUIS_ARMSTRONG 0.019MICROSOFT 0.053 THEATER 0.022 MOZART 0.019COMPAQ 0.041 LONDON 0.019 CARNEGIE_HALL 0.017SONY 0.029 GUINNESS 0.018 LATIN 0.017DELL 0.019 TONY 0.016HP 0.018 LINCOLN_CTR 0.015
ArtsComputers
MusicCompanies Theatre
Jan00 Jul00 Jan01 Jul01 Jan02 Jul02 Jan030
50
100
Jan00 Jul00 Jan01 Jul01 Jan02 Jul02 Jan030
5
10
15
Topic Trends
Tour-de-France
Anthrax
Jan00 Jul00 Jan01 Jul01 Jan02 Jul02 Jan030
10
20
30Quarterly Earnings
Proportion of words assigned to topic for that
time slice
Example of Extracted Entity-Topic Network
Muslim_Militance
Mid_East_Conflict
Palestinian_Territories
Pakistan_Indian_War
FBI_Investigation
Detainees
Mid_East_Peace
US_Military
Religion
Terrorist_Attacks
Afghanistan_War
AL_QAEDA
HAMID_KARZAIMOHAMMED
MOHAMMED_ATTA
NORTHERN_ALLIANCE
BIN_LADEN
TALIBAN
ZAWAHIRI
YASSER_ARAFAT
EHUD_BARAK
ARIEL_SHARON
HAMAS
AL_HAZMI
KING_HUSSEIN
Prediction of Missing Entities in Text
Shares of XXXX slid 8 percent, or $1.10, to $12.65 Tuesday, as major
credit agencies said the conglomerate would still be challenged in repaying
its debts, despite raising $4.6 billion Monday in taking its finance group
public. Analysts at XXXX Investors service in XXXX said they were
keeping XXXX and its subsidiaries under review for a possible debt
downgrade, saying the company “will continue to face a significant debt
burden,'' with large slices of debt coming due, over the next 18 months.
XXXX said …
Test article with entities
removed
Prediction of Missing Entities in Text
Shares of XXXX slid 8 percent, or $1.10, to $12.65 Tuesday, as major
credit agencies said the conglomerate would still be challenged in repaying
its debts, despite raising $4.6 billion Monday in taking its finance group
public. Analysts at XXXX Investors service in XXXX said they were
keeping XXXX and its subsidiaries under review for a possible debt
downgrade, saying the company “will continue to face a significant debt
burden,'' with large slices of debt coming due, over the next 18 months.
XXXX said …
fitch goldman-sachs lehman-brother moody morgan-stanley new-york-
stock-exchange standard-and-poor tyco tyco-international wall-street
worldco
Test article with entities
removed
Actual missing entities
Prediction of Missing Entities in Text
Shares of XXXX slid 8 percent, or $1.10, to $12.65 Tuesday, as major
credit agencies said the conglomerate would still be challenged in repaying
its debts, despite raising $4.6 billion Monday in taking its finance group
public. Analysts at XXXX Investors service in XXXX said they were
keeping XXXX and its subsidiaries under review for a possible debt
downgrade, saying the company “will continue to face a significant debt
burden,'' with large slices of debt coming due, over the next 18 months.
XXXX said …
fitch goldman-sachs lehman-brother moody morgan-stanley new-york-
stock-exchange standard-and-poor tyco tyco-international wall-street
worldco
wall-street new-york nasdaq securities-exchange-commission sec merrill-
lynch new-york-stock-exchange goldman-sachs standard-and-poor
Test article with entities
removed
Actual missing entities
Predicted entities given observed words (matches
in blue)
Applications II
Faculty Browser
• System collects PDF files from websites
– UCI
– UCSD
• Applies topic model on text extracted from PDFs
• Display faculty research with topics
• Demo:
http://yarra.calit2.uci.edu/calit2/
one topic
most prolific researchers for this topic
topics this researcher works on
other researchers with similar
topical interests
one researcher
Conclusions
Human MemoryInformation
Retrieval
Finding relevant memories
Finding relevant documents
Software
Public-domain MATLAB toolbox for topic modeling on the Web:
http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm
Hidden Markov Topic Model
Hidden Markov Topics Model
• Syntactic dependencies short range dependencies
• Semantic dependencies long-range
z z z z
w w w w
s s s s
Semantic state: generate words from topic model
Syntactic states: generate words from HMM
(Griffiths, Steyvers, Blei, & Tenenbaum, 2004)
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
z = 1 0.4
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
z = 2 0.6
THE 0.6 A 0.3MANY 0.1
OF 0.6 FOR 0.3BETWEEN 0.1
0.9
0.1
0.2
0.8
0.7
0.3
Transition between semantic state and syntactic states
THE ………………………………
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
z = 1 0.4
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
z = 2 0.6
x = 1
THE 0.6 A 0.3MANY 0.1
x = 3
OF 0.6 FOR 0.3BETWEEN 0.1
x = 2
0.9
0.1
0.2
0.8
0.7
0.3
Combining topics and syntax
THE LOVE……………………
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
z = 1 0.4
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
z = 2 0.6
x = 1
THE 0.6 A 0.3MANY 0.1
x = 3
OF 0.6 FOR 0.3BETWEEN 0.1
x = 2
0.9
0.1
0.2
0.8
0.7
0.3
Combining topics and syntax
THE LOVE OF………………
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
z = 1 0.4
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
z = 2 0.6
x = 1
THE 0.6 A 0.3MANY 0.1
x = 3
OF 0.6 FOR 0.3BETWEEN 0.1
x = 2
0.9
0.1
0.2
0.8
0.7
0.3
Combining topics and syntax
THE LOVE OF RESEARCH ……
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
z = 1 0.4
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
z = 2 0.6
x = 1
THE 0.6 A 0.3MANY 0.1
x = 3
OF 0.6 FOR 0.3BETWEEN 0.1
x = 2
0.9
0.1
0.2
0.8
0.7
0.3
Combining topics and syntax
FOODFOODSBODY
NUTRIENTSDIETFAT
SUGARENERGY
MILKEATINGFRUITS
VEGETABLESWEIGHT
FATSNEEDS
CARBOHYDRATESVITAMINSCALORIESPROTEIN
MINERALS
MAPNORTHEARTHSOUTHPOLEMAPS
EQUATORWESTLINESEAST
AUSTRALIAGLOBEPOLES
HEMISPHERELATITUDE
PLACESLAND
WORLDCOMPASS
CONTINENTS
DOCTORPATIENTHEALTH
HOSPITALMEDICAL
CAREPATIENTS
NURSEDOCTORSMEDICINENURSING
TREATMENTNURSES
PHYSICIANHOSPITALS
DRSICK
ASSISTANTEMERGENCY
PRACTICE
BOOKBOOKS
READINGINFORMATION
LIBRARYREPORT
PAGETITLE
SUBJECTPAGESGUIDE
WORDSMATERIALARTICLE
ARTICLESWORDFACTS
AUTHORREFERENCE
NOTE
GOLDIRON
SILVERCOPPERMETAL
METALSSTEELCLAYLEADADAM
OREALUMINUM
MINERALMINE
STONEMINERALS
POTMININGMINERS
TIN
BEHAVIORSELF
INDIVIDUALPERSONALITY
RESPONSESOCIAL
EMOTIONALLEARNINGFEELINGS
PSYCHOLOGISTSINDIVIDUALS
PSYCHOLOGICALEXPERIENCES
ENVIRONMENTHUMAN
RESPONSESBEHAVIORSATTITUDES
PSYCHOLOGYPERSON
CELLSCELL
ORGANISMSALGAE
BACTERIAMICROSCOPEMEMBRANEORGANISM
FOODLIVINGFUNGIMOLD
MATERIALSNUCLEUSCELLED
STRUCTURESMATERIAL
STRUCTUREGREENMOLDS
Semantic topics
PLANTSPLANT
LEAVESSEEDSSOIL
ROOTSFLOWERS
WATERFOOD
GREENSEED
STEMSFLOWER
STEMLEAF
ANIMALSROOT
POLLENGROWING
GROW
GOODSMALL
NEWIMPORTANT
GREATLITTLELARGE
*BIG
LONGHIGH
DIFFERENTSPECIAL
OLDSTRONGYOUNG
COMMONWHITESINGLE
CERTAIN
THEHIS
THEIRYOURHERITSMYOURTHIS
THESEA
ANTHATNEW
THOSEEACH
MRANYMRSALL
MORESUCHLESS
MUCHKNOWN
JUSTBETTERRATHER
GREATERHIGHERLARGERLONGERFASTER
EXACTLYSMALLER
SOMETHINGBIGGERFEWERLOWER
ALMOST
ONAT
INTOFROMWITH
THROUGHOVER
AROUNDAGAINSTACROSS
UPONTOWARDUNDERALONGNEAR
BEHINDOFF
ABOVEDOWN
BEFORE
SAIDASKED
THOUGHTTOLDSAYS
MEANSCALLEDCRIED
SHOWSANSWERED
TELLSREPLIED
SHOUTEDEXPLAINEDLAUGHED
MEANTWROTE
SHOWEDBELIEVED
WHISPERED
ONESOMEMANYTWOEACHALL
MOSTANY
THREETHIS
EVERYSEVERAL
FOURFIVEBOTHTENSIX
MUCHTWENTY
EIGHT
HEYOU
THEYI
SHEWEIT
PEOPLEEVERYONE
OTHERSSCIENTISTSSOMEONE
WHONOBODY
ONESOMETHING
ANYONEEVERYBODY
SOMETHEN
Syntactic classes
BEMAKE
GETHAVE
GOTAKE
DOFINDUSESEE
HELPKEEPGIVELOOKCOMEWORKMOVELIVEEAT
BECOME
MODELALGORITHM
SYSTEMCASE
PROBLEMNETWORKMETHOD
APPROACHPAPER
PROCESS
ISWASHAS
BECOMESDENOTES
BEINGREMAINS
REPRESENTSEXISTSSEEMS
SEESHOWNOTE
CONSIDERASSUMEPRESENT
NEEDPROPOSEDESCRIBESUGGEST
USEDTRAINED
OBTAINEDDESCRIBED
GIVENFOUND
PRESENTEDDEFINED
GENERATEDSHOWN
INWITHFORON
FROMAT
USINGINTOOVER
WITHIN
HOWEVERALSOTHENTHUS
THEREFOREFIRSTHERENOW
HENCEFINALLY
#*IXTN-CFP
EXPERTSEXPERTGATING
HMEARCHITECTURE
MIXTURELEARNINGMIXTURESFUNCTION
GATE
DATAGAUSSIANMIXTURE
LIKELIHOODPOSTERIOR
PRIORDISTRIBUTION
EMBAYESIAN
PARAMETERS
STATEPOLICYVALUE
FUNCTIONACTION
REINFORCEMENTLEARNINGCLASSESOPTIMAL
*
MEMBRANESYNAPTIC
CELL*
CURRENTDENDRITICPOTENTIAL
NEURONCONDUCTANCE
CHANNELS
IMAGEIMAGESOBJECT
OBJECTSFEATURE
RECOGNITIONVIEWS
#PIXEL
VISUAL
KERNELSUPPORTVECTOR
SVMKERNELS
#SPACE
FUNCTIONMACHINES
SET
NETWORKNEURAL
NETWORKSOUPUTINPUT
TRAININGINPUTS
WEIGHTS#
OUTPUTS
NIPS Semantics
NIPS Syntax
Random sentence generation
LANGUAGE:[S] RESEARCHERS GIVE THE SPEECH[S] THE SOUND FEEL NO LISTENERS[S] WHICH WAS TO BE MEANING[S] HER VOCABULARIES STOPPED WORDS[S] HE EXPRESSLY WANTED THAT BETTER VOWEL
Nested Chinese Restaurant Process
Topic Hierarchies
• In regular topic model, no relations
between topicstopic 1
topic 2 topic 3
topic 4 topic 5 topic 6 topic 7
• Nested Chinese Restaurant Process
– Blei, Griffiths, Jordan, Tenenbaum (2004)
– Learn hierarchical structure, as well as
topics within structure
Example: Psych Review Abstracts
RESPONSESTIMULUS
REINFORCEMENTRECOGNITION
STIMULIRECALLCHOICE
CONDITIONING
SPEECHREADINGWORDS
MOVEMENTMOTORVISUALWORD
SEMANTIC
ACTIONSOCIALSELF
EXPERIENCEEMOTION
GOALSEMOTIONALTHINKING
GROUPIQ
INTELLIGENCESOCIAL
RATIONALINDIVIDUAL
GROUPSMEMBERS
SEXEMOTIONS
GENDEREMOTIONSTRESSWOMENHEALTH
HANDEDNESS
REASONINGATTITUDE
CONSISTENCYSITUATIONALINFERENCEJUDGMENT
PROBABILITIESSTATISTICAL
IMAGECOLOR
MONOCULARLIGHTNESS
GIBSONSUBMOVEMENTORIENTATIONHOLOGRAPHIC
CONDITIONINSTRESS
EMOTIONALBEHAVIORAL
FEARSTIMULATIONTOLERANCERESPONSES
AMODEL
MEMORYFOR
MODELSTASK
INFORMATIONRESULTSACCOUNT
SELFSOCIAL
PSYCHOLOGYRESEARCH
RISKSTRATEGIES
INTERPERSONALPERSONALITY
SAMPLING
MOTIONVISUAL
SURFACEBINOCULAR
RIVALRYCONTOUR
DIRECTIONCONTOURSSURFACES
DRUGFOODBRAIN
AROUSALACTIVATIONAFFECTIVEHUNGER
EXTINCTIONPAIN
THEOF
ANDTOINAIS
Generative Process
RESPONSESTIMULUS
REINFORCEMENTRECOGNITION
STIMULIRECALLCHOICE
CONDITIONING
SPEECHREADINGWORDS
MOVEMENTMOTORVISUALWORD
SEMANTIC
ACTIONSOCIALSELF
EXPERIENCEEMOTION
GOALSEMOTIONALTHINKING
GROUPIQ
INTELLIGENCESOCIAL
RATIONALINDIVIDUAL
GROUPSMEMBERS
SEXEMOTIONS
GENDEREMOTIONSTRESSWOMENHEALTH
HANDEDNESS
REASONINGATTITUDE
CONSISTENCYSITUATIONALINFERENCEJUDGMENT
PROBABILITIESSTATISTICAL
IMAGECOLOR
MONOCULARLIGHTNESS
GIBSONSUBMOVEMENTORIENTATIONHOLOGRAPHIC
CONDITIONINSTRESS
EMOTIONALBEHAVIORAL
FEARSTIMULATIONTOLERANCERESPONSES
AMODEL
MEMORYFOR
MODELSTASK
INFORMATIONRESULTSACCOUNT
SELFSOCIAL
PSYCHOLOGYRESEARCH
RISKSTRATEGIES
INTERPERSONALPERSONALITY
SAMPLING
MOTIONVISUAL
SURFACEBINOCULAR
RIVALRYCONTOUR
DIRECTIONCONTOURSSURFACES
DRUGFOODBRAIN
AROUSALACTIVATIONAFFECTIVEHUNGER
EXTINCTIONPAIN
THEOF
ANDTOINAIS