Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive...

20
Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy 2 Tobias Schnabel 3 1 Cornell University {karthik,tj}@cs.cornell.edu 2 AT&T Research [email protected] 3 Stuttgart University [email protected] June 19, 2013 Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 1/5

Transcript of Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive...

Page 1: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

Stable Coactive Learning via Perturbation

Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy 2

Tobias Schnabel 3

1Cornell University {karthik,tj}@cs.cornell.edu

2AT&T Research [email protected]

3Stuttgart University [email protected]

June 19, 2013

Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 1 / 5

Page 2: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

Coactive Learning

Learning modelRepeat forever:

System receives context xt .

System makes prediction yt .

Regret = Regret + U(xt , y∗t )− U(xt , yt)

System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)

e.g. : SearchEngine

User Query

Ranking

User utility

Unrealistic for users to provide (e.g., implicit feedback).Perceptron has regret O(

1α√

T) for linear utility (U(x, y)=w>∗φ(x, y)).

Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5

Page 3: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

Coactive Learning

Learning modelRepeat forever:

System receives context xt .

System makes prediction yt .

Regret = Regret + U(xt , y∗t )− U(xt , yt)

System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)

e.g. : SearchEngine

User Query

Ranking

User utility

Unrealistic for users to provide (e.g., implicit feedback).Perceptron has regret O(

1α√

T) for linear utility (U(x, y)=w>∗φ(x, y)).

Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5

Page 4: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

Coactive Learning

Learning modelRepeat forever:

System receives context xt .

System makes prediction yt .

Regret = Regret + U(xt , y∗t )− U(xt , yt)

System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)

e.g. : SearchEngine

User Query

Ranking

User utility

Unrealistic for users to provide (e.g., implicit feedback).Perceptron has regret O(

1α√

T) for linear utility (U(x, y)=w>∗φ(x, y)).

Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5

Page 5: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

Coactive Learning

Learning modelRepeat forever:

System receives context xt .

System makes prediction yt .

Regret = Regret + U(xt , y∗t )− U(xt , yt)

System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)

e.g. : SearchEngine

User Query

Ranking

User utility

Unrealistic for users to provide (e.g., implicit feedback).Perceptron has regret O(

1α√

T) for linear utility (U(x, y)=w>∗φ(x, y)).

Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5

Page 6: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

Coactive Learning

Learning modelRepeat forever:

System receives context xt .

System makes prediction yt .

Regret = Regret + U(xt , y∗t )− U(xt , yt)

System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)

e.g. : SearchEngine

User Query

Ranking

User utility

Unrealistic for users to provide (e.g., implicit feedback).Perceptron has regret O(

1α√

T) for linear utility (U(x, y)=w>∗φ(x, y)).

Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5

Page 7: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

Coactive Learning

Learning modelRepeat forever:

System receives context xt .

System makes prediction yt .

Regret = Regret + U(xt , y∗t )− U(xt , yt)

System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .

Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)

e.g. : SearchEngine

User Query

Ranking

User utility

Unrealistic for users to provide (e.g., implicit feedback).Perceptron has regret O(

1α√

T) for linear utility (U(x, y)=w>∗φ(x, y)).

Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5

Page 8: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

Coactive Learning

Learning modelRepeat forever:

System receives context xt .

System makes prediction yt .

Regret = Regret + U(xt , y∗t )− U(xt , yt)

System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .

Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)

e.g. : SearchEngine

User Query

Ranking

User utility

Unrealistic for users to provide (e.g., implicit feedback).

Perceptron has regret O(1

α√

T) for linear utility (U(x, y)=w>∗φ(x, y)).

Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5

Page 9: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

Coactive Learning

Learning modelRepeat forever:

System receives context xt .

System makes prediction yt .

Regret = Regret + U(xt , y∗t )− U(xt , yt)

System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .Bandit: U(xt , yt)

Coactive: U(xt , yt) ≥α U(xt , yt)

e.g. : SearchEngine

User Query

Ranking

User utility

Unrealistic for users to provide (e.g., implicit feedback).

Perceptron has regret O(1

α√

T) for linear utility (U(x, y)=w>∗φ(x, y)).

Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5

Page 10: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

Coactive Learning

Learning modelRepeat forever:

System receives context xt .

System makes prediction yt .

Regret = Regret + U(xt , y∗t )− U(xt , yt)

System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)

e.g. : SearchEngine

User Query

Ranking

User utility

Unrealistic for users to provide (e.g., implicit feedback).Perceptron has regret O(

1α√

T) for linear utility (U(x, y)=w>∗φ(x, y)).

Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5

Page 11: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

Coactive Learning

Learning modelRepeat forever:

System receives context xt .

System makes prediction yt .

Regret = Regret + U(xt , y∗t )− U(xt , yt)

System gets feedback:Full information: U(xt , y(1)),U(xt , y(2)), . . .Bandit: U(xt , yt)Coactive: U(xt , yt) ≥α U(xt , yt)

e.g. : SearchEngine

User Query

Ranking

User utility

Unrealistic for users to provide (e.g., implicit feedback).

Perceptron has regret O(1

α√

T) for linear utility (U(x, y)=w>∗φ(x, y)).

Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 2 / 5

Page 12: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

User Study: Learning Rankings using Perceptron

Preference Perceptron Algo:1 Initialize weight vector w1 ← 0.2 Given context xt present

yt ← argmaxyw>t φ(xt , y).

3 Observe clicks and constructfeedback ranking yt .

4 wt+1←wt+φ(xt , yt)−φ(xt , yt).5 Repeat from step 2.

On live search engine.Goal: Learn ranking functionfrom user clicks.Interleaved comparison againsthand-tuned baseline.

Win ratio of 1 means no betterthan baseline (Higher = Better).

0.8

1

1.2

1.4

1.6

1.8

2

2.2

0 5000 10000 15000 20000 25000 30000

Win

Rat

io

Number Of Iterations

User Study Results

Preference Perceptron

Perceptron performs poorly!

Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 3 / 5

Page 13: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

User Study: Learning Rankings using Perceptron

Preference Perceptron Algo:1 Initialize weight vector w1 ← 0.2 Given context xt present

yt ← argmaxyw>t φ(xt , y).

3 Observe clicks and constructfeedback ranking yt .

4 wt+1←wt+φ(xt , yt)−φ(xt , yt).5 Repeat from step 2.

On live search engine.Goal: Learn ranking functionfrom user clicks.Interleaved comparison againsthand-tuned baseline.Win ratio of 1 means no betterthan baseline (Higher = Better).

0.8

1

1.2

1.4

1.6

1.8

2

2.2

0 5000 10000 15000 20000 25000 30000

Win

Rat

io

Number Of Iterations

User Study Results

Preference Perceptron

Perceptron performs poorly!

Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 3 / 5

Page 14: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

User Study: Learning Rankings using Perceptron

Preference Perceptron Algo:1 Initialize weight vector w1 ← 0.2 Given context xt present

yt ← argmaxyw>t φ(xt , y).

3 Observe clicks and constructfeedback ranking yt .

4 wt+1←wt+φ(xt , yt)−φ(xt , yt).5 Repeat from step 2.

On live search engine.Goal: Learn ranking functionfrom user clicks.Interleaved comparison againsthand-tuned baseline.Win ratio of 1 means no betterthan baseline (Higher = Better).

0.8

1

1.2

1.4

1.6

1.8

2

2.2

0 5000 10000 15000 20000 25000 30000

Win

Rat

io

Number Of Iterations

User Study Results

Preference Perceptron

Perceptron performs poorly!Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 3 / 5

Page 15: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

User Study: Learning Rankings using Perceptron

Preference Perceptron Algo:1 Initialize weight vector w1 ← 0.2 Given context xt present

yt ← argmaxyw>t φ(xt , y).

3 Observe clicks and constructfeedback ranking yt .

4 wt+1←wt+φ(xt , yt)−φ(xt , yt).5 Repeat from step 2.

Presented Ranking (y)

On live search engine.Goal: Learn ranking functionfrom user clicks.Interleaved comparison againsthand-tuned baseline.Win ratio of 1 means no betterthan baseline (Higher = Better).

0.8

1

1.2

1.4

1.6

1.8

2

2.2

0 5000 10000 15000 20000 25000 30000

Win

Rat

io

Number Of Iterations

User Study Results

Preference Perceptron

Perceptron performs poorly!Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 3 / 5

Page 16: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

User Study: Learning Rankings using Perceptron

Preference Perceptron Algo:1 Initialize weight vector w1 ← 0.2 Given context xt present

yt ← argmaxyw>t φ(xt , y).3 Observe clicks and construct

feedback ranking yt .

4 wt+1←wt+φ(xt , yt)−φ(xt , yt).5 Repeat from step 2.

Click!

Click!

Presented Ranking (y)

On live search engine.Goal: Learn ranking functionfrom user clicks.Interleaved comparison againsthand-tuned baseline.Win ratio of 1 means no betterthan baseline (Higher = Better).

0.8

1

1.2

1.4

1.6

1.8

2

2.2

0 5000 10000 15000 20000 25000 30000

Win

Rat

io

Number Of Iterations

User Study Results

Preference Perceptron

Perceptron performs poorly!Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 3 / 5

Page 17: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

User Study: Learning Rankings using Perceptron

Preference Perceptron Algo:1 Initialize weight vector w1 ← 0.2 Given context xt present

yt ← argmaxyw>t φ(xt , y).3 Observe clicks and construct

feedback ranking yt .

4 wt+1←wt+φ(xt , yt)−φ(xt , yt).5 Repeat from step 2.

Click!

Click!

Presented Ranking (y) Feedback Ranking (y )

On live search engine.Goal: Learn ranking functionfrom user clicks.Interleaved comparison againsthand-tuned baseline.Win ratio of 1 means no betterthan baseline (Higher = Better).

0.8

1

1.2

1.4

1.6

1.8

2

2.2

0 5000 10000 15000 20000 25000 30000

Win

Rat

io

Number Of Iterations

User Study Results

Preference Perceptron

Perceptron performs poorly!Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 3 / 5

Page 18: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

User Study: Learning Rankings using Perceptron

Preference Perceptron Algo:1 Initialize weight vector w1 ← 0.2 Given context xt present

yt ← argmaxyw>t φ(xt , y).3 Observe clicks and construct

feedback ranking yt .4 wt+1←wt+φ(xt , yt)−φ(xt , yt).5 Repeat from step 2.

Click!

Click!

Presented Ranking (y) Feedback Ranking (y )

On live search engine.Goal: Learn ranking functionfrom user clicks.Interleaved comparison againsthand-tuned baseline.Win ratio of 1 means no betterthan baseline (Higher = Better).

0.8

1

1.2

1.4

1.6

1.8

2

2.2

0 5000 10000 15000 20000 25000 30000

Win

Rat

io

Number Of Iterations

User Study Results

Preference Perceptron

Perceptron performs poorly!Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 3 / 5

Page 19: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

Perturbed Preference Perceptron1 Initialize weight vector w1 ← 0.2 Given context xt compute

yt ← argmaxyw>t φ(xt , y).3 Present yt ← Perturb(yt)

(Randomly swap adjacent pairs).4 Observe clicks and construct

feedback ranking yt .5 wt+1←wt+φ(xt , yt)−φ(xt , yt).6 Repeat from step 2.

0.8

1

1.2

1.4

1.6

1.8

2

2.2

0 5000 10000 15000 20000 25000 30000

Win

Rat

io

Number Of Iterations

User Study Results

Preference Perceptron

Perturbed Preference Perceptron

Presented Ranking (y) Predicted Ranking (y )

PERTURB

Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 4 / 5

Page 20: Stable Coactive Learning via Perturbationkarthik/Publications/PPT/ICML-2013.pdf · Stable Coactive Learning via Perturbation Karthik Raman 1 Thorsten Joachims 1 Pannaga Shivaswamy

Please come to our poster

I will tell you:

Why the preference perceptron performs poorly?Why does perturbation fix the problem?What are the regret bounds for the algorithm?How do we do this more generally for non-rankingproblems?

Karthik Raman (Cornell) Stable Coactive Learning June 19, 2013 5 / 5