Paroma Varma, Bryan He, PayalBajaj, Nishith Khandwala...
Transcript of Paroma Varma, Bryan He, PayalBajaj, Nishith Khandwala...
Inferring Generative Model Structure with Static Analysis Paroma Varma, Bryan He, Payal Bajaj, Nishith Khandwala, Imon Banerjee, Daniel Rubin, Christopher Ré
MotivationsSummary
• Generative Models to Label Training DataUse generative models to combine sources of weak supervision to assign noisy labels
• Complex Dependencies among SourcesSources of labels are rarely independent
• Inferring Model StructureUse static analysis to infer dependencies and encode in generative model structure
Heuristic Structure• Domain Specific Primitives (DSPs)
Interpretable characteristics of raw data
• Heuristic Functions (HFs)Programmatic rules that output noisy labels
MotivationsStatic Analysis• Shared Input Sharing primitives as inputs
leads to explicit dependencies
• Compositions: Primitives composed of others can lead to implicit dependencies
Statistical Modeling• HF Dependency Represents the
dependencies found using static analysis
• DSP Similarity Represents the learned correlations among the DSPs
MotivationsTheoretical Results
• Learning DependenciesLearning k-degree dependencies among 𝑛heuristics requires 𝑂 𝑛#$%log𝑛 samples
• Inferring DependenciesAnalyzing the code can infer the dependencies among heuristics without data
Given dependencies, learning heuristic accuracies requires 𝑂 𝑛 log𝑛 samples
MotivationsExperimental Results
p1𝜙%+,
𝜙-+,
𝜙.+,
Y 𝜙/01
λ1
λ2
λ3
p2
p3
𝜙-233
𝜙%233
𝜙.233
P.w
P.h
P.yλ1
λ2
λ3
DSP HF
Shared Input, Explicit Dep.
Composition, Implicit Dep.
𝜙4+,
𝜙5233
𝜙/01
YTrue Label
Accuracy
HFDependency
DSP Similarity
def λ_1(P.y):if P.y[‘person’]>= P.y[‘bike’]:
return Trueelse:
return False
def λ_2(P.h):if P.h[‘person’] >= P.h[‘bike’]:
return Trueelse:
return False
def λ_3(P.area):if P.area[‘person’]
<= 2*P.area[‘bike’]:return True
else:return False
• Inferring dependencies outperforms learning dependencies
• Outperforms fully supervised model with additional noisy training labels
* reports F1 scores, rest in accuracy (%)