Ranked Recall: Efficient Classification by Learning Indices That Rank
description
Transcript of Ranked Recall: Efficient Classification by Learning Indices That Rank
ResearchResearch
Ranked Recall: Efficient Classification by Learning
Indices That Rank
Omid Madani
with Michael Connor (UIUC)
ResearchResearchMany Category Learning (e.g. Y! Directory)
Arts&Humanities
Photography
Magazines ContestsEducation
History
Business&Economy Recreation&Sports
Sports
Amateur
college
basketballOver 100,000 categories in the Yahoo! directory
Given a page, quickly categorize… Larger for vision, text prediction,...
(millions and beyond)
ResearchResearch
Supervised Learning • Often two phases:
• Training
• Execution/Testing
A Learnt classifier f(categorizer)
f(unseen) instance class prediction(s)
Classfeatures11 0 3
50 0 1
21 1 0
20 0 0
?0 0 1 Y)x(f
x
x1x2x3
Often learn binary classifiers
ResearchResearch
Massive Learning
• Lots of ...• Instances (millions, unbounded..)• Dimensions (1000s and beyond)• Categories (1000s and beyond)
• Two questions:1. How to quickly categorize?
2. How to efficiently learn to categorize efficiently?
ResearchResearch
Efficiency1. Two phases (combined when online):
1. Learning
2. Classification time/deployment
2. Resource requirements:1. Memory
2. Time
3. Sample efficiency
ResearchResearch
Idea
• Cues in input may quickly narrow down possibilities => “index” categories
• Like search engine, but learn a good index
• Goal: learn to strike a good balance between accuracy and efficiency
ResearchResearch
Summary Findings
• Very fast: • Train time: minutes versus hours/days
(compared against one-versus-rest and top-down)
• Classification time: O(|x|)?• Memory efficient
• Simple to use (runs on laptop..)• Competitive accuracy!
ResearchResearch
Problem Formulation
ResearchResearchInput-Output Summary
features categoriesinstances
Input:tripartite graph
learn
features categories
Output: an index = sparse weighted
directed bipartite graph(sparse matrix)
21w
if jcijw
2f1c
ResearchResearch
Scheme
• Learn a weighted bipartite graph
• Rank categories retrieved
• For category assignment, could use rank, or define thresholds, or map scores to probabilities, etc.
ResearchResearch
Three Parts to the Online of Solution
• How to use the index?
• How to update (learn) it?
• When to update it?
ResearchResearch
Retrieval (Ranked Recall)
}f,f{x 32
1. Features are “activated”
features categories
c1
c2
c3
c4
c5
f1
f2
f3
f42. Edges are activated
3. Receiving categories are activated4. Categories sorted/ranked
).,c(),.,c(),.,c(),.,c(
:list sorted
10104050 1534
40.
30.20.
10.
10.
1. Like use of inverted indices2. Sparse dot products
ResearchResearch
Computing the Index
• Efficiency: Impose a constraint on every feature’s maximum out-degree
• Accuracy: Connect and compute weights so that some measure of accuracy is maximized..
ResearchResearch
• Measure average performance per instance
• Recall: The proportion of instances for which the right category ended up in top k
• Recall at k = 1 (R1), 5 (R5), 10, …• R1=“Accuracy” when “multiclass”
Measure of Accuracy: Recall
ResearchResearch
Computational Complexity• NP-Hard!• The problem: given a finite set of
instances (Boolean features), exactly one category per instance, is there an index with max out-degree 1, such that R1 on training set is greater than a threshold t ?
• Reduction from set cover• Approximation? (not known)
ResearchResearch
How About Practice?
• Devised two main learning algorithms:• IND treats features independently.• Feature Normalize (FN) doesn’t make an
independence assumption; it’s online.• Only non-negative weights are learned.
ResearchResearch
Feature Normalize (FN) Algorithm
• Begin with an empty index
• Repeat• Input instance (features + categories), and
retrieve and rank candidate categories
• If margin is not met, update index
ResearchResearch
Three Parts (Online Setting)
• How to use the index?
• How to update it?
• When to update it?
ResearchResearch
Index Updating
• For each active feature:• Strengthen weights between active
feature and true category• Weaken the other connections to
the feature
• Strengthening = Increase weight by addition or multiplication
ResearchResearch
Updating
features categories
c1
c2
c3
c4
c5
f1
f2
f3
f4
3
2
Cx
xf
1. Identify connection
2. Increase weight
3. Normalize/weaken other weights
4. Drop small weights
ResearchResearch
Three Parts
• How to use an index?
• How to update it?
• When to update it?
ResearchResearch
A Tradeoff
1. To achieve stability (helps accuracy), we need to keep updating (think single feature scenario)
2. To “fit” more instances, we need to stop updates on instances that we get “right”
Use of margin threshold strikes a balance.
ResearchResearch
Margin Definition
• Margin = score of the true positive category MINUS score of highest ranked negative category
• Choice of margin threshold: • Fixed, e.g. 0,0.1, 0.5, …• Online average (eg: average of the
last 10000 margins + 0.1)
ResearchResearch
Salient Aspects of FN
• “Differentially” updates, attempts to improve retrieved ranking (in “context”)
• Normalizes, but from “feature’s side”• No explicit weight demotion/punishment!
(normalization/weakening achieves demotion/reordering ..)
• Memory/Efficiency conscious design from the outset • Very dynamic/adaptive:
• edges added and dropped• Weights adjusted, categories reordered
• Extensions/variations exit (e.g. each feature’s out-degree may dynamically adjust)
ResearchResearch
Reuters 21578
Domain statistics
121014k685k70k Web
1.42712.6k301k369kAds
2.087641447k23kReuters RCV1
Avg labels per x
Avg vector length|C|
# of
features
# of
Instances
industry
115.117.4k299k749kJane Austin
• Experiments are average of 10 runs, each run is a single pass, with 90% for training, 10% held out• |C| is the number of classes, L is avg vector length, Cavg is average
Number of categories per instance
20 News grp
23k9.6k
20k
9.4k
60k
33k
1
180.9 10
20
1120
80
10469k
ResearchResearchSmaller Domains
Keerthi and DeCoste, 06 (fast linear SVM)
• Max out-degree = 25, min allowed weight = 0.01, tested with margins 0, 0.1, and 0.5 and up to 10 passes• 90-10 random splits
10 categories, 10k instances
ResearchResearch Three Smaller Domains
20 categories, 20k instances
ResearchResearch Three Smaller Domains
104 categories, 10k instances
ResearchResearch3 Large Data Sets (top-down comparisons)
~500 categories, 20k instances
~12.6k categories, ~370k instances
~14k categories, ~70k instances
ResearchResearchAccuracy vs. Max Out-Degree
max out-degree allowed
accuracy
Web page categorization
Ads
RCV1
ResearchResearchAccuracy vs. Passes and Margin
# passes
Accuracy
ResearchResearch
Related Work and Discussion • Multiclass learning/categorization
algorithms (top-down, nearest neighbors, perceptron, Naïve Bayes, MaxEnt, SVMs, online methods, ..),
• Speed up methods (trees, indices, …)• Feature selection/reduction• Evaluation criteria• Fast categorization in the natural world• Prediction games! (see poster)
ResearchResearch
Summary
• A scalable supervised learning method for huge class sets (and instances,..)
• Idea: learn an index (a sparse weighted bipartite graph, mapping features to categories)
• Online time/memory efficient algorithms• Current/future: more algorithms, theory,
other domains/applications, ..