Download - Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Transcript
Page 1: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Rethinking Machine Learning in the 21st Century: From

Optimization to EquilibrationSridhar Mahadevan!

Autonomous Learning Laboratory!

Research funded in part by the National Science Foundation, NASA and AFOSR

Page 2: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

(Almost) Dimension-Free optimization

!

To !

(Almost) Dimension-Free equilibration

Page 3: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

http://all.cs.umass.edu/pubs.shtml

1981-2014

Page 4: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Lab Directors! !

Thomas Boucher!! !

CJ Carey!! !

Bruno Castro da Silva!! !

William Dabney!! !

Stefan Dernbach!! !

Kimberly Ferguson!! !

Ian Gemp!! !

Stephen Giguere!! !

Thomas Helmuth!! !

Nicholas Jacek!! !

Bo Liu!! !

Clemens Rosenbaum!! !

Andrew Stout!! !

Philip Thomas!! !

Chris Vigorito

Current PhD !Students

Barto (Emeritus)

Bo

Ian

Will

Phil

Steve

Nick

Tommy

CJ

Page 5: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Transfer Learning on Mars (Darby Dyar, Mount Holyoke; Thomas Boucher, Clifton

Carey, Stephen Giguere, UMass)

Same laser on Earth

as on Mars

Curiosity zapping a !rock with a laser

Page 6: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Low-Dimensional Representation Discovery

Mixture of low-dimensional (non-Euclidean) spaces

Original Data

New dimensionality reduction method

Page 7: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Optimization Equilibration

min f(x) x in feasible set K hF (x⇤), x� x

⇤i � 0, 8x 2 K

(Stampacchia, 1960s)

Page 8: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Part I: Some (Personal) History and Motivation

Page 9: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Iravatham Mahadevan ஐராவதம மகாேதவன

5th century BC Tamil Brahmi

Early Tamil Epigraphy from the Earliest Times! to the Sixth Century A.D!

Edited and translated by Iravatham MahadevanHarvard Oriental Series 62

Page 10: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Iravatham Mahadevan, June 2014

at 85

Obituaries | PASSINGS Murray Emeneau, 101; Founded UC Berkeley Linguistics Department

Murray Barnson Emeneau, 101, an expert in Sanskrit and Dravidian languages who founded the UC Berkeley Linguistics Department, died Aug. 29 in his sleep of natural causes at his

Berkeley home.

1904-2005

Page 11: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Indus Script

Great Bath Mohenjadaro2500

B.C.

Page 12: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Published Online April 23 2009 Science 29 May 2009:

Vol. 324 no. 5931 p. 1165 DOI: 10.1126/science.1170391

• BREVIA Entropic Evidence for Linguistic Structure in

the Indus Script . Rajesh P. N. Rao, Nisha Yadav, Mayank N. Vahia, Hrishikesh Joglekar, R. Adhikari,

Iravatham Mahadevan

DNA

Fortran

Language

Is the Indus Script really a language?

Page 13: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Entia non sunt multiplicanda sine necessitate

William of Ockham14th century logician and Franciscan friar

Bertrand Russell’s explication of Willam’s philosophy

Page 14: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Herbert Simon

Why should machines learn?

Learning denotes changes in a system that are adaptive in that they enable the system

to do the same task or tasks drawn from the same population more efficiently and

more effectively the next time around

Page 15: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Learning to Drive

Neural Network

ALVINN learns from a human driver

Can drive on actual highways at 65 miles per hour!

Page 16: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Søren Kierkegaard 19th century Danish philosopher

Life must be lived going forwards but can only be understood

going backwards

Page 17: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Arthur Samuel's Checker Playing Program

(IBM 700, 1956)

First demo of ML!

IBM’s T.J. Watson said it would increase IBM stock by 10 points

and it did!

Page 18: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

> 16,000 citations

MIT Press

Temporal Difference Learning Algorithm

http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node1.html

Andy Barto

!Richard Sutton

Page 19: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Least Squares

Temporal Difference Learning

Dedicated to Andrew Barto and Richard Sutton for inspiring ageneration of researchers to the study of reinforcement learning.

Algorithm 1 TD (1984)

(1) �t

= rt

+ ��0t

T ✓t

� �T

t

✓t

(2) ✓t+1 = ✓

t

+ �t

�t

Algorithm 2 GTD2-MP (2014)

(1) wt+ 1

2= w

t

+ �t

(�t

� �T

t

wt

)�t

,

✓t+ 1

2= prox

↵th

✓t

+ ↵t

(�t

� ��0t

)(�T

t

wt

)�

(2) �t+ 1

2= r

t

+ ��0t

T ✓t+ 1

2� �T

t

✓t+ 1

2

(3)wt+1 = w

t

+ �t

(�t+ 1

2� �T

t

wt+ 1

2)�

t

,

✓t+1 = prox

↵th

✓t

+ ↵t

(�t

� ��0t

)(�T

t

wt+ 1

2)⌘

Page 20: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization
Page 21: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

TD-Learning Fails (not always, but predictably!)

TD diverges

Baird counter example

Page 22: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Optimization by Gradient Descent TD

1984-2014:!Can TD be converted!into a "true" gradient !

method?!

Min f(x), x in feasible set K

Latest attempt: Sutton, 2009 Introduces gradient objectives

but solves without true gradient computation

Page 23: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Part II:Equilibration in Reinforcement Learning

Page 24: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces,!

Arxiv, May 26, 2014 (126 pages)!!

!

Sridhar Mahadevan, Bo Liu, Philip Thomas, Will Dabney, Stephen Giguere,!

Nicholas Jacek, Ian Gemp,!Ji Liu!

Page 25: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Proximal Reinforcement Learning in Primal-Dual Spaces

Why Proximal RL?

ScalabilitySparsity

Safety Reliability

Page 26: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Søren Kierkegaard 19th century Danish philosopher

Life must be lived going forwards but can only be understood

going backwardsIn the Dual Space!

Page 27: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

(Almost) Dimension-free optimization

Page 28: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Mirror Maps (Nemirovski and Yudin, 1980s; Bubeck, 2014)

r�

r�⇤

xtxt+1 X

r�(xt)

r�(yt+1)

gradient step

D

DUALSPACE

PRIMALSPACE

Rn

Page 29: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Variational Inequality (Stampacchia, 1960s)

Normal Cone

-F(x*)

F(x*)

x*x x-x*

Feasible set K

hF (x⇤), x� x

⇤i � 0, 8x 2 K

Page 30: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Extragradient Method

xk

�F(xk)

�F(yk)yk

xk+1

K

Korpolevich (1970s) developed the extragradient method

Page 31: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Mirror-Prox: Non-Euclidean Extragradient

(Nemirovski, 2005)

r�

r�⇤

xtxt+1 X

r�(xt)

D

�⌘rf(yt+1)

r�(x0t+1)

r�(y0t+1)

�⌘rf(xt)

extragradient step

Rn

Page 32: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

True Gradient TD-Learning: RL meets VI

Dedicated to Andrew Barto and Richard Sutton for inspiring ageneration of researchers to the study of reinforcement learning.

Algorithm 1 TD (1984)

(1) �t

= rt

+ ��0t

T ✓t

� �T

t

✓t

(2) ✓t+1 = ✓

t

+ �t

�t

Algorithm 2 GTD2-MP (2014)

(1) wt+ 1

2= w

t

+ �t

(�t

� �T

t

wt

)�t

,

✓t+ 1

2= prox

↵th

✓t

+ ↵t

(�t

� ��0t

)(�T

t

wt

)�

(2) �t+ 1

2= r

t

+ ��0t

T ✓t+ 1

2� �T

t

✓t+ 1

2

(3)wt+1 = w

t

+ �t

(�t+ 1

2� �T

t

wt+ 1

2)�

t

,

✓t+1 = prox

↵th

✓t

+ ↵t

(�t

� ��0t

)(�T

t

wt+ 1

2)⌘

(Mahadevan et al., Arxiv 2014)

Page 33: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Baird counter example

Sutton et al., 2009Our new methods

2014

Page 34: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

20-Dimensional Robot Arm

Our new method

Sutton et al., 2009

Page 35: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Safe Robot Learning

UBot, Laboratory of Perceptual Robotics

Our new method

Previous method

Thomas, Dabney, Mahadevan, Giguere, NIPS 2013

Page 36: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Mirror Descent = “Natural” Gradient (Nemirovsky and Yudin; Amari, 1980s)

Natural gradientMirror Descent

Thomas, Dabney, Mahadevan, Giguere, NIPS 2013

Mirror Map

Page 37: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Part III: Equilibration Framework for ML, CS

Page 38: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

The “Invisible Hand” of the Internet

“The Internet is an equilibrium — we just have!to identify the game (Scott Shenker)”!

!“The Internet was the first computational artifact that!was not created by a single entity, but emerged from!

the strategic interaction of many (Christos Papadimitriou)”!!!

Adam Smith The Wealth of Nations

1776

Page 39: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Competing Goals of the Internet:1992-2014

Page 40: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

LA Times Story. June 05 2014 Verizon tells Netflix to stop blaming it for streaming issues

"Netflix has been aware for some time that a few Internet middlemen have congestion issues with some IP Networks and nonetheless, Netflix has chosen to continue sending its traffic over those congested routes," said Verizon General

Counsel Randal Milch. (Paul Sakuma / AP)

!Verizon wants Netflix to stop blaming the Internet Service

Provider for any streaming issues customers may be having when watching TV shows and movies.

!

Page 41: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Part IV: New Algorithms, Applications, Results

Page 42: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Normal Cone

-F(x*)

F(x*)

x*x x-x*

Feasible set K

Optimization

Game theory

Nonlinear equationsolving

Complementarityproblems

Variational Inequalities

Traffic equilibriumproblem

21st century ML

20th century ML

Page 43: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

(Almost) Dimension-Free optimization

!

To !

(Almost) Dimension-Free equilibration

Page 44: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Fixed Point Formulation

Normal Cone

-F(x*)

F(x*)

x*x x-x*

Feasible set K

Page 45: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Extragradient Method

xk

�F(xk)

�F(yk)yk

xk+1

K

Korpolevich (1970s) developed the extragradient method

Page 46: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Søren Kierkegaard Revisited

Life must be lived going forwards !!!

but can only be understood going backwards

many times in the Dual Space!

many times in the Dual Space!

Page 47: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Our Latest VI Methods (Gemp and Mahadevan, 2014)

270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323

Intuitively, the Bregman divergence measures the difference between the value of a strongly con-vex function (x) and the estimate derived from the first-order Taylor series expansion at (y).Many widely used distance measures turn out to be special cases of Bregman divergences, suchas Euclidean distance (where (x) =

12kxk2 ) and Kullback Liebler divergence (where (x) =P

i

x

i

log2 xi

, the negative entropy function). In general, Bregman divergences are non-symmetric,but projections onto a convex set with respect to a Bregman divergence are well-defined. The generalmirror descent procedure can thus be defined as:

x

k+1 = argminx2K

✓hx, @f(x

k

)i+ 1

k

D

(x, x

k

)

◆(3)

The solution to this optimization problem can be stated succinctly as the following generalizedgradient descent algorithm, which forms the core procedure in mirror descent:

x

k+1 = r ⇤(r (x

k

)� ↵

k

@f(x

k

)) (4)

Here, ⇤ is the Legendre transform of the strongly convex function , which is defined as

⇤(y) = sup

x2K

(hx, yi � (x))

Equation 4 carries out the gradient update in two stages. In the first step, the gradient is computedand x

k

is updated in the dual space. Subsequently the updated x

k

is mapped back into the primalspace.

General Runge-Kutta Mirror Descent (RKMDA)

k1 = ↵

k

F (x

k

)

k2 = ↵

k

F (r ⇤k

(r k

(x

k

)� a21k1))

k3 = ↵

k

F (r ⇤k

(r k

(x

k

)� a31k1 � a32k2))

...k

s

= ↵

k

F (r ⇤k

(r k

(x

k

)�a

s1k1�a

s2k2� . . .�a

s,s�1ks�1))

x

k+1 = r ⇤k

(r k

(x

k

)� ⌃

s

i=1biki)

Figure 3: The proposed Runge Kutta family of algorithms for solving V I(F,K).

A detailed theoretical analysis of the proposed RKMDA method is the main topic of a subsequentpaper, as it requires introducing lengthy technical issues dealing with discontinuous differentialequations and projected dynamical systems, beyond the scope of this introductory paper on VIs.

4 Experimental Results

4.1 Service Oriented Internet

There are dozens of real-world applications of VIs [9], from supply chain manufacturing to financialnetworks and traffic equilibrium problems. Many of these problems are beyond the scope of classicaloptimization methods, since they require dealing with asymmetric Jacobian mappings. Given spaceconstraints, we select two representative network equilibrium problems of current interest, namely a“next-generation” economic model of the Internet [10], and a sustainable environmentally conscioussupply chain network. This game-theoretic model of a service-oriented Internet has members of thenetwork (see Figure 4) compete to maximize profits by adjusting the quantity, quality, and price ofservices delivered. Service providers (e.g., Netflix, Amazon) play a Cournot-Nash game controllingthe quantities of services provided while network providers (e.g., Verizon, AT&T) play a Bertrandgame controlling the delivery price as well as service quality. Consumers influence the networkthrough demand functions dictating the prices they are willing to pay for specific quantities andqualities of services rendered. The variational inequality in Figure 4 is defined in terms of the servicequantity (Q), quality (q), and price (⇡) delivered from service provider i by network provider j to

6

Page 48: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Benchmark VI Problem

update is defined as:

x

k+1j

=

x

k

j

e

�↵k

F (xk)

Pn

j=1 x

k

j

e

�↵k

F (xk)(5)

The learning rates in our approach were adapted as described in Figure 5 for the Euler-Heun method andusing the Cash-Karp method [7]. We compared our methods with the procedure developed by Dang andLan [9], which is essentially the mirror-prox method [18] with an extra line search step. Their approach,which we refer to below as DL

Best

, is described below as Algorithm 3. The benchmark VI problems thatwe used are as follows:

Algorithm 3 Non-Euclidean Extragradient VI Method proposed by Dang and Lan [9]INPUT: Given initial point x1 2 K, initial step size �0 2 (0, 1), and � 2 (0, 1).

1: Set k = 1.2: Compute y

k

= r ⇤(r (x

k

)� �k

F (x

k

))

3: Compute the residual R

�0(xk

) =

1�0

(x

k

� ⇧

K

(�F (x

k

)). If kR�0(xk

)k = 0, terminate the algorithm. Oth-erwise, do a line search to find the learning rate �

k

with the largest value from the list {�0, �0�, �0�2, . . .}

such that kF (x

k

)� F (y

k

)k2⇤ ↵

2k

D

(x

k

, y

k

).4: Set x

k+1 = r ⇤(r (x

k

)� �k

F (y

k

))

5: Set k k + 1 and go to step 2.

1. Kojima-Shindo problem: These problems are from [37]. K = {x 2 R4|P4

i=1 x

i

= 1}, and theoperator F : R4 ! R4 is defined as:

F (x) =

0

BB@

3x

21 + 2x1x2 + 2x

22 + x3 + 3x4 � 6

2x

21 + x1 + x

22 + 10x3 + 2x4 � 2

3x

21 + x1x2 + 2x

22 + 2x3 + 9x4 � 9

x

21 + 3x

22 + 2x3 + 3x4 � 3

1

CCA .

2. Watson (WAT) problem: These were proposed by Watson in [51]. The operator F : R10 ! R10 isgiven by the affine mapping F (x) = Ax + b, where

F (x) =

0

BBBBBBBBBBBBBB@

0 0 �1 �1 �1 1 1 0 1 1�2 �1 0 1 1 2 2 0 �1 01 0 1 �2 �1 �1 0 2 0 02 1 �1 0 1 0 �1 �1 �1 1�2 0 1 1 0 2 2 �1 1 0�1 0 1 1 1 0 �1 2 0 10 �1 1 0 2 �1 0 0 1 �10 �2 2 0 0 1 2 2 �1 00 �1 0 2 2 1 1 1 �1 02 �1 �1 0 1 0 0 �1 2 2

1

CCCCCCCCCCCCCCA

b = e

i

, the unit vector. There are 10 different instances of the Watson problem obtained for e1, . . . , e10.

3. Sun problem: This problem was proposed in [45]. The affine operator F : Rn ! Rn is again given yF (x) = Ax + b, where

A =

0

BBBBB@

1 2 2 . . . 2

0 1 2 . . . 2

0 0 1 . . . 2

......

.... . .

2

0 0 0 . . . 1

1

CCCCCA

and b = (�1, . . . ,�1)

T . The problem instances ranged from n = 8000 to n = 30, 000.

8

Page 49: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Results on Benchmark VI

Figure 6: This figure compares our proposed adaptive step size Runge Kutta VI methods described inFigure 4 against a recently proposed non-Euclidean extragradient method by Dang and Lan [9] (Algorithm3 above) [9]. See text for explanation.

10

Page 50: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

✓⌘◆⇣✓⌘◆⇣✓⌘◆⇣

✓⌘◆⇣✓⌘◆⇣✓⌘◆⇣

✓⌘◆⇣✓⌘◆⇣✓⌘◆⇣

?

?

@@@@@R

@@@@@R

PPPPPPPPPPPPPPq

PPPPPPPPPPPPPPq

?

?

HHHHHHHHHj

HHHHHHHHHj

���

��

���

��

?

?

���������⇡

���������⇡

⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣)

⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣)

· · ·

· · ·

· · ·

1

1

1

ok· · ·

nj· · ·

mi· · ·

Network Providers

Demand Markets

Service Providers

Figure 1: The Network Structure of the Cournot-Nash-Bertrand Modelfor a Service-Oriented Internet

Professor Anna Nagurney Network Economics and the Internet

Cournot-Nash!game

Bertrand !game

Next Generation Internet Model [Nagurney et al., 2014]

Page 51: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Problem Formulation

✓⌘◆⇣✓⌘◆⇣✓⌘◆⇣

✓⌘◆⇣✓⌘◆⇣✓⌘◆⇣

✓⌘◆⇣✓⌘◆⇣✓⌘◆⇣

?

?

@@@@@R

@@@@@R

PPPPPPPPPPPPPPq

PPPPPPPPPPPPPPq

?

?

HHHHHHHHHj

HHHHHHHHHj

��

���

��

���

?

?

���������⇡

���������⇡

⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣)

⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣⇣)

· · ·

· · ·

· · ·

1

1

1

o

k

· · ·

n

j

· · ·

m

i

· · ·

Network Providers

Demand Markets

Service Providers

Figure 1: The Network Structure of the Cournot-Nash-Bertrand Model for a Service-OrientedInternet

Table 1: Notation for the Game Theoretic Cournot-Nash-Bertrand ModelNotation Definition

Q

ijk

the nonnegative service volume from i to k via j.We group the {Q

ijk

} elements for all j and k into the vector Q

i

2 R

no

+

and then we group all the vectors Q

i

for all i into the vector Q 2 R

mno

+ .s

i

the service volume (output) produced by service provider i.We group the {s

i

} elements into the vector s 2 R

m

+ .q

ijk

the nonnegative quality level of network provider j transporting servicei to k. We group the q

ijk

for all i and k into the vector q

j

2 R

mo

+ andall the vectors q

j

for all j into the vector q 2 R

mno

+ .⇡

ijk

the price charged by network provider j for transporting a unit ofservice provided by i via j to k. We group the ⇡

ijk

for all i and k intothe vector ⇡

j

2 R

m

+ and then we group all the vectors ⇡

j

for all j intothe vector ⇡ 2 R

mno

+ .f

i

(s) the total production cost of service provider i.⇢

ijk

(Q, q) the demand price at k associated with service i transported via j.c

ijk

(Q, q) the total transportation cost associated with delivering service i via j

to k.oc

ijk

(⇡ijk

) the opportunity cost associated with pricing by network provider j

services transported from i to k.

6

Page 52: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

VI FomulationProduction cost function f(Q) - cost of providing a certain volume of content

Demand price function - user offer depends on content quality and market volume

Page 53: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Simple Example

✓⌘◆⇣1Service Providers@@@R

✓⌘◆⇣2���

✓⌘◆⇣1Network Provider

?

✓⌘◆⇣1Demand Market

Figure 3: Network Topology for Another Illustrative Example

The production cost functions are:

f1(Q) = Q

2111 + Q111, f2(Q) = 2Q2

211 + Q211.

The demand price functions are:

⇢111(Q, q) = �Q111 � .5Q211 + .5q111 + 100, ⇢211(Q, q) = �Q211 � .5Q111 + .5q211 + 200.

The transportation cost functions are:

c111(Q, q) = .5(q111 � 20)2, c211(Q, q) = .5(q211 � 10)2

,

with the opportunity cost functions being:

oc111(⇡111) = ⇡

2111, oc211(⇡211) = ⇡

2211.

Using (24) through (26), we construct the following:

F

1111(X) = 2Q111 + 1 + ⇡111 + Q111 + .5Q211 � .5q111 � 100 + Q111,

F

1211(X) = 4Q211 + 1 + ⇡211 + Q211 + .5Q111 � .5q211 � 200 + Q211,

F

2111(X) = q111 � 20, F

2211(X) = q211 � 10,

F

3111(X) = �Q111 + 2⇡111, F

3211(X) = �Q211 + 2⇡211.

Solving, as for the first example in Section 2.1, we obtain:

Q

⇤111 = 21.00, Q

⇤211 = 30.00,

q

⇤111 = 20.00, q

⇤211 = 10.00,

⇤111 = 10.50, ⇡

⇤211 = 15.00.

The profits for the service providers are: U

11 (X⇤) = 875.00 and U

12 (X⇤) = 2660.00, whereas

the profit for the network provider, U

21 (X⇤) = 331.00.

15

Page 54: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Example

✓⌘◆⇣1Service Providers@@@R

✓⌘◆⇣2���

✓⌘◆⇣1Network Provider

?

✓⌘◆⇣1Demand Market

Figure 3: Network Topology for Another Illustrative Example

The production cost functions are:

f1(Q) = Q

2111 + Q111, f2(Q) = 2Q2

211 + Q211.

The demand price functions are:

⇢111(Q, q) = �Q111 � .5Q211 + .5q111 + 100, ⇢211(Q, q) = �Q211 � .5Q111 + .5q211 + 200.

The transportation cost functions are:

c111(Q, q) = .5(q111 � 20)2, c211(Q, q) = .5(q211 � 10)2

,

with the opportunity cost functions being:

oc111(⇡111) = ⇡

2111, oc211(⇡211) = ⇡

2211.

Using (24) through (26), we construct the following:

F

1111(X) = 2Q111 + 1 + ⇡111 + Q111 + .5Q211 � .5q111 � 100 + Q111,

F

1211(X) = 4Q211 + 1 + ⇡211 + Q211 + .5Q111 � .5q211 � 200 + Q211,

F

2111(X) = q111 � 20, F

2211(X) = q211 � 10,

F

3111(X) = �Q111 + 2⇡111, F

3211(X) = �Q211 + 2⇡211.

Solving, as for the first example in Section 2.1, we obtain:

Q

⇤111 = 21.00, Q

⇤211 = 30.00,

q

⇤111 = 20.00, q

⇤211 = 10.00,

⇤111 = 10.50, ⇡

⇤211 = 15.00.

The profits for the service providers are: U

11 (X⇤) = 875.00 and U

12 (X⇤) = 2660.00, whereas

the profit for the network provider, U

21 (X⇤) = 331.00.

15

Page 55: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Example Results

✓⌘◆⇣1Service Providers@@@R

✓⌘◆⇣2���

✓⌘◆⇣1Network Provider

?

✓⌘◆⇣1Demand Market

Figure 3: Network Topology for Another Illustrative Example

The production cost functions are:

f1(Q) = Q

2111 + Q111, f2(Q) = 2Q2

211 + Q211.

The demand price functions are:

⇢111(Q, q) = �Q111 � .5Q211 + .5q111 + 100, ⇢211(Q, q) = �Q211 � .5Q111 + .5q211 + 200.

The transportation cost functions are:

c111(Q, q) = .5(q111 � 20)2, c211(Q, q) = .5(q211 � 10)2

,

with the opportunity cost functions being:

oc111(⇡111) = ⇡

2111, oc211(⇡211) = ⇡

2211.

Using (24) through (26), we construct the following:

F

1111(X) = 2Q111 + 1 + ⇡111 + Q111 + .5Q211 � .5q111 � 100 + Q111,

F

1211(X) = 4Q211 + 1 + ⇡211 + Q211 + .5Q111 � .5q211 � 200 + Q211,

F

2111(X) = q111 � 20, F

2211(X) = q211 � 10,

F

3111(X) = �Q111 + 2⇡111, F

3211(X) = �Q211 + 2⇡211.

Solving, as for the first example in Section 2.1, we obtain:

Q

⇤111 = 21.00, Q

⇤211 = 30.00,

q

⇤111 = 20.00, q

⇤211 = 10.00,

⇤111 = 10.50, ⇡

⇤211 = 15.00.

The profits for the service providers are: U

11 (X⇤) = 875.00 and U

12 (X⇤) = 2660.00, whereas

the profit for the network provider, U

21 (X⇤) = 331.00.

15

✓⌘◆⇣1Service Providers@@@R

✓⌘◆⇣2���

✓⌘◆⇣1Network Provider

?

✓⌘◆⇣1Demand Market

Figure 3: Network Topology for Another Illustrative Example

The production cost functions are:

f1(Q) = Q

2111 + Q111, f2(Q) = 2Q2

211 + Q211.

The demand price functions are:

⇢111(Q, q) = �Q111 � .5Q211 + .5q111 + 100, ⇢211(Q, q) = �Q211 � .5Q111 + .5q211 + 200.

The transportation cost functions are:

c111(Q, q) = .5(q111 � 20)2, c211(Q, q) = .5(q211 � 10)2

,

with the opportunity cost functions being:

oc111(⇡111) = ⇡

2111, oc211(⇡211) = ⇡

2211.

Using (24) through (26), we construct the following:

F

1111(X) = 2Q111 + 1 + ⇡111 + Q111 + .5Q211 � .5q111 � 100 + Q111,

F

1211(X) = 4Q211 + 1 + ⇡211 + Q211 + .5Q111 � .5q211 � 200 + Q211,

F

2111(X) = q111 � 20, F

2211(X) = q211 � 10,

F

3111(X) = �Q111 + 2⇡111, F

3211(X) = �Q211 + 2⇡211.

Solving, as for the first example in Section 2.1, we obtain:

Q

⇤111 = 21.00, Q

⇤211 = 30.00,

q

⇤111 = 20.00, q

⇤211 = 10.00,

⇤111 = 10.50, ⇡

⇤211 = 15.00.

The profits for the service providers are: U

11 (X⇤) = 875.00 and U

12 (X⇤) = 2660.00, whereas

the profit for the network provider, U

21 (X⇤) = 331.00.

15

Page 56: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Results on Internet VI Problem

Our new algorithm

Extragradient

Page 57: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Sustainable Supply Chain

Nagurney et al.

Page 58: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Results on Sustainable Supply Chain VI Problem

Our new algorithm

Extragradient

Page 59: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Normal Cone

-F(x*)

F(x*)

x*x x-x*

Feasible set K

Optimization

Game theory

Nonlinear equationsolving

Complementarityproblems

Variational Inequalities

Traffic equilibriumproblem

21st century ML

20th century ML

Page 60: Rethinking Machine Learning in the 21st Century: From ...mahadeva/Site/msr-june11-final.pdf · Applications, Results . Normal Cone-F(x*) F(x*) x x-x* x* Feasible set K Optimization

Questions?