Welcome to Bowdoin | Bowdoin College - Birds, books, and …tpietrah/TALKS/bbm.pdf · 2018. 3....

Post on 21-Feb-2021

1 views 0 download

Transcript of Welcome to Bowdoin | Bowdoin College - Birds, books, and …tpietrah/TALKS/bbm.pdf · 2018. 3....

Birds, books, and matrices: a brief adventurein artificial intelligence and neural networks

Thomas PietrahoSpring, 2018

I am an algebraist

Neural networks: major successes

Neural nets can recognize images

carball

bridge burrito

Current accuracy ≈ 95%

Neural nets can translate

Polish: mój poduszkowiec jest pełen węgorzy

English: my hovercraft is full of eels

Google’s version is very close to human translation for anumber of languages. Not Chinese.

Neural nets can play games

Top ranked Go player defeated by AlphaGo (4-1). AlphaGodestroyed by AlphaGo Zero (100-0).

Image by Saran Poroong

Neural networks: minor successes

Neural nets can judge a book by its cover

history science romance sports

Problem: Predict book genre based on its cover.

Accuracy 76%.

with Parikshit Sharma, ’17, IndieBio

Neural nets can identify birds

cardinal wood duck anhinga chickadee

Problem: Predict species of bird based on image.

Accuracy 87%. (P., 2017)

american crow fish crow common raven

Neural nets can identify birds

cardinal wood duck anhinga chickadee

Problem: Predict species of bird based on image.

Accuracy 87%. (P., 2017)

american crow fish crow common raven

Neural nets can be useful to an algebraist?

From The Accountant

What are neural nets?

Neural nets are functions

x

y

Image courtesy of JD Cruzan

Neural nets are functions

2

4

Image courtesy of JD Cruzan

Neural nets are functions

3

9

Image courtesy of JD Cruzan

Neural nets are functions

4

16

Image courtesy of JD Cruzan

Neural nets are functions

1.00 4.98 7.21 9.89 1.01 2.30

3.72 2.67 22.01 1.92 3.70

Image courtesy of JD Cruzan

Neural nets are functions

1.00 4.98 7.21 9.89 1.01 2.30

3.72 2.67 22.01 1.92 3.70

In this form, neural nets can carry out

• regression, or•

Image courtesy of JD Cruzan

Neural nets are functions

1.00 4.98 7.21 9.89 1.01 2.30

0 0 0 1 0

In this form, neural nets can carry out

• regression, or• classification

Image courtesy of JD Cruzan

Neural nets are made up of “neurons”

Two parameters: laziness and loudness.This specifies a neuron’s activation function.

Neural nets are networks of neurons

output

input

Neural nets are universal

Theorem (G. Cybenko 1989)Every function can be modeled as a neural network.

Examples of functions: image classification, languagetranslation, etc.

Question: Why no self-drivingcars in the 1990s?

Learning with neural nets

Procedure:

• assemble a neural network (craft)• adjust laziness and loudness for each neuron (math)• measure error based on a sample of data and repeat(fast processors)

Advances in all three parts of this process are responsible forthe machine learning revolution since 2012.

Image courtesy of Kaiming He

We don’t completely understand why neural nets work

Image courtesy of Elsayed et. al.

A problem in algebra

Matrix multiplication

( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18

)

·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11

)=

( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25

)

This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.

Goal: minimize number of ordinary multiplications: “rank”

Matrix multiplication

( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18

)·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11

)

=( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25

)

This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.

Goal: minimize number of ordinary multiplications: “rank”

Matrix multiplication

( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18

)·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11

)=

( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25

)

This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.

Goal: minimize number of ordinary multiplications: “rank”

Matrix multiplication

( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18

)·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11

)=

( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25

)

This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.

Goal: minimize number of ordinary multiplications: “rank”

matrix size rank2× 2 83× 3 274× 4 64

1000× 1000 109

Matrix multiplication

( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18

)·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11

)=

( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25

)

This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.

Goal: minimize number of ordinary multiplications: “rank”

matrix size rank2× 2 �A8 7 (Strassen, 1969)3× 3 274× 4 64

1000× 1000 109

Matrix multiplication

( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18

)·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11

)=

( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25

)

This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.

Goal: minimize number of ordinary multiplications: “rank”

matrix size rank2× 2 �A8 7 (Strassen, 1969)3× 3 274× 4 ��ZZ64 49 (Strassen, 1969)

1000× 1000 109

Matrix multiplication

( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18

)·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11

)=

( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25

)

This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.

Goal: minimize number of ordinary multiplications: “rank”

matrix size rank2× 2 �A8 7 (Strassen, 1969)3× 3 274× 4 ��ZZ64 49 (Strassen, 1969)

1000× 1000 ��ZZ109 264M (Strassen, 1969)

Matrix multiplication

( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18

)·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11

)=

( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25

)

This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.

Goal: minimize number of ordinary multiplications: “rank”

matrix size rank2× 2 �A8 7 (Strassen, 1969)3× 3 ��ZZ27 23 (Lederman, 1976)4× 4 ��ZZ64 49 (Strassen, 1969)

1000× 1000 ��ZZ109 264M (Strassen, 1969)

Matrix multiplication

( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18

)·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11

)=

( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25

)

This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.

Goal: minimize number of ordinary multiplications: “rank”

matrix size rank2× 2 �A8 7 (Strassen, 1969)3× 3 ��ZZ27 23 (Lederman, 1976)4× 4 ��ZZ64��ZZ49 48 (Stothers, 2012)

1000× 1000 ��ZZ109 264M (Strassen, 1969)

Matrix multiplication

( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18

)·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11

)=

( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25

)

This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.

Goal: minimize number of ordinary multiplications: “rank”

matrix size rank2× 2 �A8 7 (Strassen, 1969)3× 3 ��ZZ27 23 (Lederman, 1976)4× 4 ��ZZ64��ZZ49 48 (Stothers, 2012)

1000× 1000 ��ZZ109���XXX264M 238M (Stothers, 2012)

A little insight

A neural network can model matrix multiplication:

(a bc d

)

·(e fg h

)=

(i jk l

)

A little insight

A neural network can model matrix multiplication:

(a bc d

)·(e fg h

)

=(i jk l

)

A little insight

A neural network can model matrix multiplication:

(a bc d

)·(e fg h

)=

(i jk l

)

A little insight

A neural network can model matrix multiplication:

(a bc d

)·(e fg h

)=

(i jk l

)

A little insight

A neural network can model matrix multiplication:

(a bc d

)·(e fg h

)=

(i jk l

)

A little insight

A neural network can model matrix multiplication:

(a bc d

)·(e fg h

)=

(i jk l

)

Question: can our methods learn this network?

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8

X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X

7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X 7

X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X 7 X

6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X 7 X 6

X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

error over learning time

matrix size rank2× 2 8 X 7 X 6 X

2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X 7 X 6 X2× 3 11

X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X

10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10

X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

error over learning time

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X

3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15

X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X

14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14

X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X

3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23

X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

error over learning time

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X

22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22

X4× 4 49 X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

error over learning time

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X

4× 4 49 X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49

X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X

48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48

X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

error over learning time

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X

47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47

X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

error over learning time

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X

46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46

X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

error over learning time

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X

45 X 44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45

X 44 X

Will it learn?

Thanks: Dj and HPC

error over learning time

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X

44 X

Will it learn?

Thanks: Dj and HPC

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44

X

Will it learn?

Thanks: Dj and HPC

error over learning time

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X

Will it learn?

Thanks: Dj and HPC

Upshot: This result reduces the computational costfor 1000 × 1000 matrix multiplication from 238M to172M ordinary multiplications!

matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X

I am an algebraist

Luckily (for algebraists), the neural network solution is only anapproximation.

Question: how can one obtain an exact solution?

Hint: algebra

I am an algebraist

Luckily (for algebraists), the neural network solution is only anapproximation.

Question: how can one obtain an exact solution?

Hint: algebra