Stochastic Optimization and Variational · PDF file Stochastic Optimization and Variational...
date post
17-Feb-2020Category
Documents
view
4download
0
Embed Size (px)
Transcript of Stochastic Optimization and Variational · PDF file Stochastic Optimization and Variational...
Stochastic Optimization and Variational Inference
David M. Blei
Princeton University
February 27, 2014
OPTICS
PROSTHESIS
PLASTIC AND NONMETALLIC ARTICLE SHAPING
STOCK MATERIAL
Process for producing porous products (bridgeness = 267)
Waterproof Laminate
Light reflectant surface in a recessed cavity substantially!surrounding a compact fluorescent lamp
Prosthesis comprising an expansible or contractile tubular body
Tubular polytetrafluoroethylene implantable prostheses
Communities discovered in a 3.7M node network of U.S. Patents
[Gopalan and Blei, PNAS 2013]
Hoffman, Blei, Wang, and Paisley
game
second
season team
play
games
players
points
coach
giants
street school
house
life
children
family says
night
man
know
life
says
show
man director
television
film
story
movie
films
house
life
children
man
war
book
story
books
author
novel
street
house
night place
park
room
hotel
restaurant
garden
wine
house
bush
political
party
clinton campaign
republican
democratic
senator democrats percent
street
house
building
real
space development
square housing
buildings
game
second team
play
won
open
race
win
round cup
game
season
team
run league
games hit
baseball
yankees
mets
government
officials
war military
iraq
army
forces
troops
iraqi
soldiers
school
life
children
family
says
women
help mother
parents child
percent
business
market
companies
stock
bank
financial
fund
investors funds government
life
war women
political black
church
jewish
catholic
pope
street
show
art museum
works artists
artist
gallery
exhibition paintings street
yesterday police
man
case found
officer
shot
officers
charged
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
Figure 10: The 15 most frequent topics from the HDP posterior on the New York Times. Each topic plot illustrates the topic’s most frequent words.
42
Topics found in 1.8M articles from the New York Times
[Hoffman, Blei, Wang, Paisley, JMLR 2013]
Adygei BalochiBantuKenyaBantuSouthAfricaBasque Bedouin BiakaPygmy Brahui BurushoCambodianColombianDai Daur Druze French Han Han−NChinaHazaraHezhenItalian Japanese Kalash KaritianaLahu Makrani Mandenka MayaMbutiPygmyMelanesianMiaoMongola Mozabite NaxiOrcadianOroqen Palestinian Papuan Pathan Pima Russian San Sardinian She Sindhi Surui Tu TujiaTuscanUygurXibo Yakut Yi Yoruba
pr ob
pops 1 2 3 4 5 6 7
Adygei BalochiBantuKenyaBantuSouthAfricaBasque Bedouin BiakaPygmy Brahui BurushoCambodianColombianDai Daur Druze French Han Han−NChinaHazaraHezhenItalian Japanese Kalash KaritianaLahu Makrani Mandenka MayaMbutiPygmyMelanesianMiaoMongola Mozabite NaxiOrcadianOroqen Palestinian Papuan Pathan Pima Russian San Sardinian She Sindhi Surui Tu TujiaTuscanUygurXibo Yakut Yi Yoruba
pr ob
pops 1 2 3 4 5 6 7
Population analysis of 2 billion genetic measurements
[Gopalan, Hao, Blei, Storey, in preparation]
Neuroscience analysis of 220 million fMRI measurements
[Manning, Ranganath, Blei, Norman, submitted]
This talk
Build model Infer hidden variables
Data
Predict & Explore
� Customized data analysis is important to many fields.
� Pipeline separates assumptions, computation, application
� Eases collaborative solutions to data science problems
This talk
Build model Infer hidden variables
Data
Predict & Explore
� Graphical models are a language for expressing assumptions about data.
� Variational methods turn inference into optimization.
� Stochastic optimization scales up and generalizes variational methods.
This talk
Build model Infer hidden variables
Data
Predict & Explore
� Introduction to variational methods
� Scaling up with stochastic variational inference [Hoffman et al., 2013]
� Generalizing with black box variational inference [Ranganath et al., 2014]
Stochastic Variational Inference (with Matt Hoffman, Chong Wang, John Paisley)
Example: Latent Dirichlet allocation
gene 0.04 dna 0.02 genetic 0.01 .,,
life 0.02 evolve 0.01 organism 0.01 .,,
brain 0.04 neuron 0.02 nerve 0.01 ...
data 0.02 number 0.02 computer 0.01 .,,
Topics Documents Topic proportions andassignments
Generative process
Example: Latent Dirichlet allocation
Topics Documents Topic proportions andassignments
Posterior inference
Classical variational inference
θd N D K
α ηwd,nzd,n βk
� Given data, estimate the conditional distribution of the hidden variables.
� Local variables describe per-data point hidden structure. � Global variables describe structure shared by all the data.
� Classical variational inference:
� Do some local computation for each data point. � Aggregate these computations to re-estimate global structure. � Repeat.
� Inefficient, and cannot handle massive data sets.
Stochastic variational inference
Adygei BalochiBantuKenyaBantuSouthAfricaBasque Bedouin BiakaPygmy Brahui BurushoCambodianColombianDai Daur Druze French Han Han−NChinaHazaraHezhenItalian Japanese Kalash KaritianaLahu Makrani Mandenka MayaMbutiPygmyMelanesianMiaoMongola Mozabite NaxiOrcadianOroqen Palestinian Papuan Pathan Pima Russian San Sardinian She Sindhi Surui Tu TujiaTuscanUygurXibo Yakut Yi Yoruba
pr ob
pops 1 2 3 4 5 6 7
Adygei BalochiBantuKenyaBantuSouthAfricaBasque Bedouin BiakaPygmy Brahui BurushoCambodianColombianDai Daur Druze French Han Han−NChinaHazaraHezhenItalian Japanese Kalash KaritianaLahu Makrani Mandenka MayaMbutiPygmyMelanesianMiaoMongola Mozabite NaxiOrcadianOroqen Palestinian Papuan Pathan Pima Russian San Sardinian She Sindhi Surui Tu TujiaTuscanUygurXibo Yakut Yi Yoruba
pr ob
pops 1 2 3 4 5 6 7
GLOBAL HIDDEN STRUCTURE
SUBSAMPLE DATA
INFER LOCAL
STRUCTURE
UPDATE GLOBAL
STRUCTURE
MASSIVE DATA SET
Stochastic variational inference
4096
systems health
communication service billion
language care road
8192
service systems health
companies market
communication company billion
12288
service systems companies business company billion health industry
16384
service companies systems business company industry market billion
32768
business service
companies industry company
management systems services
49152
business service
companies industry services company
management public
2048
systems road made service
announced national west
language
65536
business industry service
companies services company
management public
Documents analyzed
Top eight words
Documents seen (log scale)
P e rp le x it y
600
650
700
750
800
850
900
10 3.5
10 4
10 4.5
10 5
10 5.5
10 6
10 6.5
Batch 98K
Online 98K
Online 3.3M
[Hoffman et al., 2010]
Hoffman, Blei, Wang, and Paisley
game
second
season team
play
games
players
points
coach
giants
street school
house
life
children
family says
night
man
know
life
says
show
man director
television
film
story
movie
films
house
life
children
man
war
book
story
books
author
novel
street
house
night place
park
room
hotel
restaurant
garden
wine
house
bush
political
party
clinton campaign
republican
democratic
senator democrats percent
street
house
building
real
space development
square housing
buildings
game
second team
play
won
open
race
win
round cup
game
season
team
run league
games hit
baseball
yankees