Research Ranked Recall: Efficient Classification by Learning Indices That Rank Omid Madani with...

33
Research Research Ranked Recall: Efficient Classification by Learning Indices That Rank Omid Madani with Michael Connor (UIUC)

Transcript of Research Ranked Recall: Efficient Classification by Learning Indices That Rank Omid Madani with...

ResearchResearch

Ranked Recall: Efficient Classification by Learning

Indices That Rank

Omid Madani

with Michael Connor (UIUC)

ResearchResearchMany Category Learning (e.g. Y! Directory)

Arts&Humanities

Photography

Magazines ContestsEducation

History

Business&Economy Recreation&Sports

Sports

Amateur

college

basketballOver 100,000 categories in the Yahoo! directory

Given a page, quickly categorize… Larger for vision, text prediction,...

(millions and beyond)

ResearchResearch

Supervised Learning • Often two phases:

• Training

• Execution/Testing

A Learnt classifier f(categorizer)

f(unseen) instance class prediction(s)

Classfeatures11 0 3

50 0 1

21 1 0

20 0 0

?0 0 1 Y)x(f

x

x1x2x3

Often learn binary classifiers

ResearchResearch

Massive Learning

• Lots of ...• Instances (millions, unbounded..)• Dimensions (1000s and beyond)• Categories (1000s and beyond)

• Two questions:1. How to quickly categorize?

2. How to efficiently learn to categorize efficiently?

ResearchResearch

Efficiency1. Two phases (combined when online):

1. Learning

2. Classification time/deployment

2. Resource requirements:1. Memory

2. Time

3. Sample efficiency

ResearchResearch

Idea

• Cues in input may quickly narrow down possibilities => “index” categories

• Like search engine, but learn a good index

• Goal: learn to strike a good balance between accuracy and efficiency

ResearchResearch

Summary Findings

• Very fast: • Train time: minutes versus hours/days

(compared against one-versus-rest and top-down)

• Classification time: O(|x|)?• Memory efficient

• Simple to use (runs on laptop..)• Competitive accuracy!

ResearchResearch

Problem Formulation

ResearchResearchInput-Output Summary

features categoriesinstances

Input:tripartite graph

learn

features categories

Output: an index = sparse weighted

directed bipartite graph(sparse matrix)

21w

if jcijw

2f1c

ResearchResearch

Scheme

• Learn a weighted bipartite graph

• Rank categories retrieved

• For category assignment, could use rank, or define thresholds, or map scores to probabilities, etc.

ResearchResearch

Three Parts to the Online of Solution

• How to use the index?

• How to update (learn) it?

• When to update it?

ResearchResearch

Retrieval (Ranked Recall)

}f,f{x 32

1. Features are “activated”

features categories

c1

c2

c3

c4

c5

f1

f2

f3

f42. Edges are activated

3. Receiving categories are activated4. Categories sorted/ranked

).,c(),.,c(),.,c(),.,c(

:list sorted

10104050 1534

40.

30.20.

10.

10.

1. Like use of inverted indices2. Sparse dot products

ResearchResearch

Computing the Index

• Efficiency: Impose a constraint on every feature’s maximum out-degree

• Accuracy: Connect and compute weights so that some measure of accuracy is maximized..

ResearchResearch

• Measure average performance per instance

• Recall: The proportion of instances for which the right category ended up in top k

• Recall at k = 1 (R1), 5 (R5), 10, …• R1=“Accuracy” when “multiclass”

Measure of Accuracy: Recall

ResearchResearch

Computational Complexity• NP-Hard!• The problem: given a finite set of

instances (Boolean features), exactly one category per instance, is there an index with max out-degree 1, such that R1 on training set is greater than a threshold t ?

• Reduction from set cover• Approximation? (not known)

ResearchResearch

How About Practice?

• Devised two main learning algorithms:• IND treats features independently.• Feature Normalize (FN) doesn’t make an

independence assumption; it’s online.• Only non-negative weights are learned.

ResearchResearch

Feature Normalize (FN) Algorithm

• Begin with an empty index

• Repeat• Input instance (features + categories), and

retrieve and rank candidate categories

• If margin is not met, update index

ResearchResearch

Three Parts (Online Setting)

• How to use the index?

• How to update it?

• When to update it?

ResearchResearch

Index Updating

• For each active feature:• Strengthen weights between active

feature and true category• Weaken the other connections to

the feature

• Strengthening = Increase weight by addition or multiplication

ResearchResearch

Updating

features categories

c1

c2

c3

c4

c5

f1

f2

f3

f4

3

2

Cx

xf

1. Identify connection

2. Increase weight

3. Normalize/weaken other weights

4. Drop small weights

ResearchResearch

Three Parts

• How to use an index?

• How to update it?

• When to update it?

ResearchResearch

A Tradeoff

1. To achieve stability (helps accuracy), we need to keep updating (think single feature scenario)

2. To “fit” more instances, we need to stop updates on instances that we get “right”

Use of margin threshold strikes a balance.

ResearchResearch

Margin Definition

• Margin = score of the true positive category MINUS score of highest ranked negative category

• Choice of margin threshold: • Fixed, e.g. 0,0.1, 0.5, …• Online average (eg: average of the

last 10000 margins + 0.1)

ResearchResearch

Salient Aspects of FN

• “Differentially” updates, attempts to improve retrieved ranking (in “context”)

• Normalizes, but from “feature’s side”• No explicit weight demotion/punishment!

(normalization/weakening achieves demotion/reordering ..)

• Memory/Efficiency conscious design from the outset • Very dynamic/adaptive:

• edges added and dropped• Weights adjusted, categories reordered

• Extensions/variations exit (e.g. each feature’s out-degree may dynamically adjust)

ResearchResearch

Reuters 21578

Domain statistics

121014k685k70k Web

1.42712.6k301k369kAds

2.087641447k23kReuters RCV1

Avg labels per x

Avg vector length|C|

# of

features

# of

Instances

industry

115.117.4k299k749kJane Austin

• Experiments are average of 10 runs, each run is a single pass, with 90% for training, 10% held out• |C| is the number of classes, L is avg vector length, Cavg is average

Number of categories per instance

20 News grp

23k9.6k

20k

9.4k

60k

33k

1

180.9 10

20

1120

80

10469k

ResearchResearchSmaller Domains

Keerthi and DeCoste, 06 (fast linear SVM)

• Max out-degree = 25, min allowed weight = 0.01, tested with margins 0, 0.1, and 0.5 and up to 10 passes• 90-10 random splits

10 categories, 10k instances

ResearchResearch Three Smaller Domains

20 categories, 20k instances

ResearchResearch Three Smaller Domains

104 categories, 10k instances

ResearchResearch3 Large Data Sets (top-down comparisons)

~500 categories, 20k instances

~12.6k categories, ~370k instances

~14k categories, ~70k instances

ResearchResearchAccuracy vs. Max Out-Degree

max out-degree allowed

accuracy

Web page categorization

Ads

RCV1

ResearchResearchAccuracy vs. Passes and Margin

# passes

Accuracy

ResearchResearch

Related Work and Discussion • Multiclass learning/categorization

algorithms (top-down, nearest neighbors, perceptron, Naïve Bayes, MaxEnt, SVMs, online methods, ..),

• Speed up methods (trees, indices, …)• Feature selection/reduction• Evaluation criteria• Fast categorization in the natural world• Prediction games! (see poster)

ResearchResearch

Summary

• A scalable supervised learning method for huge class sets (and instances,..)

• Idea: learn an index (a sparse weighted bipartite graph, mapping features to categories)

• Online time/memory efficient algorithms• Current/future: more algorithms, theory,

other domains/applications, ..