Artificial Intelligence CIS 342

39
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.

description

Artificial Intelligence CIS 342. The College of Saint Rose David Goldschmidt, Ph.D. Machine Learning. Machine learning involves adaptive mechanisms that enable computers to: Learn from experience Learn by example Learn by analogy - PowerPoint PPT Presentation

Transcript of Artificial Intelligence CIS 342

Page 1: Artificial Intelligence CIS 342

Artificial Intelligence

CIS 342

The College of Saint RoseDavid Goldschmidt, Ph.D.

Page 2: Artificial Intelligence CIS 342

Machine learning involves adaptive mechanisms that enable computers to:– Learn from experience– Learn by example– Learn by analogy

Learning capabilities improve the performanceof intelligent systems over time

Machine Learning

Page 3: Artificial Intelligence CIS 342

How do brains work?– How do human brains differ from that

of other animals?

Can we base models ofartificial intelligence onthe structure and innerworkings of the brain?

The Brain

Page 4: Artificial Intelligence CIS 342

The human brain consists of:– Approximately 10 billion neurons – …and 60 trillion connections

The brain is a highly complex, nonlinear,parallel information-processing system– By firing neurons simultaneously, the brain

performs faster than the fastest computers in existence today

The Brain

Page 5: Artificial Intelligence CIS 342

Soma Soma

Synapse

Synapse

Dendrites

Axon

Synapse

Dendrites

Axon

Building blocks of the human brain:

The Brain

Page 6: Artificial Intelligence CIS 342

An individual neuron has a very simple structure– Cell body is called a soma– Small connective fibers are called dendrites– Single long fibers are called axons

An army of such elements constitutes tremendous processing power

The Brain

Page 7: Artificial Intelligence CIS 342

An artificial neural network consists of a numberof very simple processors called neurons

– Neurons are connectedby weighted links

– The links pass signals fromone neuron to another basedon predefined thresholds

Artificial Neural Networks

Page 8: Artificial Intelligence CIS 342

An individual neuron (McCulloch & Pitts, 1943):– Computes the weighted sum of the input

signals – Compares the result with a threshold value,

– If the net input is less than the threshold,

the neuron output is –1 (or 0)– Otherwise, the neuron becomes activated

and its output is +1

Artificial Neural Networks

Page 9: Artificial Intelligence CIS 342

Artificial Neural Networks

Neuron Y

InputSignals

x1

x2

xn

OutputSignals

Y

Y

Y

w2

w1

wn

Weights

X = x1w1 + x2w2 + ... + xnwn

threshold

Page 10: Artificial Intelligence CIS 342

Individual neurons adhere to an activation function, which determines whether they propagate their signal (i.e. activate) or not:

Sign Function

Activation Functions

n

iiiwxX

1

1

1

X

XY

if,

if,

Page 11: Artificial Intelligence CIS 342

Activation Functions

Step function Sign function

+1

-1

0

+1

-1

0X

Y

X

Y

1 1

-1

0 X

Y

Sigmoid function

-1

0 X

Y

Linear function

0if,0

0if,1

X

XYstep

0if,1

0if,1

X

XYsign X

sigmoid

eY

1

1XYlinear

hard limit functions

Page 12: Artificial Intelligence CIS 342

The step, sign, and sigmoid activation functionsare also often called hard limit functions

We use such functions indecision-making neural networks– Support classification and

other pattern recognition tasks

Activation Functions

Write functions or methods for theactivation functions on the previous slide

Page 13: Artificial Intelligence CIS 342

Can an individual neuron learn?– In 1958, Frank Rosenblatt introduced a

training algorithm that provided thefirst procedure for training asingle-node neural network

– Rosenblatt’s perceptron model consistsof a single neuron with adjustablesynaptic weights, followed by a hard limiter

Perceptrons

Page 14: Artificial Intelligence CIS 342

Perceptrons

Threshold

Inputs

x1

x2

Output

Y

HardLimiter

w2

w1

LinearCombiner

X = x1w1 + x2w2

Y = Ystep

Write code for a single two-input neuron – (see below)

Set w1, w2, and Θ through trial and errorto obtain a logical AND of inputs x1 and x2

Page 15: Artificial Intelligence CIS 342

A perceptron:– Classifies inputs x1, x2, ..., xn

into one of two distinctclasses A1 and A2

– Forms a linearly separablefunction defined by:

Perceptrons

x1

x2

Class A2

Class A1

1

2

x1w1 +x2w2 =0

(a) Two-inputperceptron. (b) Three-inputperceptron.

x2

x1

x3x1w1 +x2w2 +x3w3 =0

12

01

n

iiiwx

Page 16: Artificial Intelligence CIS 342

Perceptron with threeinputs x1, x2, and x3 classifies its inputsinto two distinctsets A1 and A2

Perceptrons

x1

x2

Class A2

Class A1

1

2

x1w1 +x2w2 =0

(a) Two-inputperceptron. (b) Three-inputperceptron.

x2

x1

x3x1w1 +x2w2 +x3w3 =0

12

01

n

iiiwx

Page 17: Artificial Intelligence CIS 342

How does a perceptron learn?– A perceptron has initial (often random)

weights typically in the range [-0.5, 0.5]– Apply an established training dataset – Calculate the error as

expected output minus actual output:

error e = Yexpected – Yactual

– Adjust the weights to reduce the error

Perceptrons

Page 18: Artificial Intelligence CIS 342

How do we adjust a perceptron’sweights to produce Yexpected?– If e is positive, we need to increase Yactual

(and vice versa)

– Use this formula:, where

and

α is the learning rate (between 0 and 1) e is the calculated error

Perceptrons

wi = wi + Δwi Δwi = α x xi x e

Page 19: Artificial Intelligence CIS 342

Train a perceptron to recognize logical AND

Perceptron Example – AND

Use threshold Θ = 0.2 andlearning rate α = 0.1

Inputs

x1 x2

0011

0101

000

EpochDesiredoutputYd

1

Initialweights

w1 w2

1

0.30.30.30.2

0.1 0.1 0.1 0.1

0010

Actualoutput

Y

Error

e

00

11

Finalweights

w1 w20.30.30.20.3

0.1 0.1 0.10.0

0011

0101

000

2

1

0.30.30.30.2

0011

00

10

0.30.30.20.2

0.00.00.00.0

0011

0101

000

3

1

0.20.20.20.1

0.00.00.00.0

0.00.00.00.0

0010

00

11

0.20.20.10.2

0.00.00.00.1

0011

0101

000

4

1

0.20.20.20.1

0.10.10.10.1

0011

00

10

0.20.20.10.1

0.10.10.10.1

0011

0101

000

5

1

0.10.10.10.1

0.10.10.10.1

0001

00

0

0.10.10.10.1

0.10.10.10.1

0

Threshold: =0.2;learningrate: =0.1

Inputs

x1 x2

0011

0101

000

EpochDesiredoutputYd

1

Initialweights

w1 w2

1

0.30.30.30.2

0.1 0.1 0.1 0.1

0010

Actualoutput

Y

Error

e

00

11

Finalweights

w1 w20.30.30.20.3

0.1 0.1 0.10.0

0011

0101

000

2

1

0.30.30.30.2

0011

00

10

0.30.30.20.2

0.00.00.00.0

0011

0101

000

3

1

0.20.20.20.1

0.00.00.00.0

0.00.00.00.0

0010

00

11

0.20.20.10.2

0.00.00.00.1

0011

0101

000

4

1

0.20.20.20.1

0.10.10.10.1

0011

00

10

0.20.20.10.1

0.10.10.10.1

0011

0101

000

5

1

0.10.10.10.1

0.10.10.10.1

0001

00

0

0.10.10.10.1

0.10.10.10.1

0

Threshold: =0.2;learningrate: =0.1

Page 20: Artificial Intelligence CIS 342

Inputs

x1 x2

0011

0101

000

EpochDesiredoutputYd

1

Initialweights

w1 w2

1

0.30.30.30.2

0.1 0.1 0.1 0.1

0010

Actualoutput

Y

Error

e

00

11

Finalweights

w1 w20.30.30.20.3

0.1 0.1 0.10.0

0011

0101

000

2

1

0.30.30.30.2

0011

00

10

0.30.30.20.2

0.00.00.00.0

0011

0101

000

3

1

0.20.20.20.1

0.00.00.00.0

0.00.00.00.0

0010

00

11

0.20.20.10.2

0.00.00.00.1

0011

0101

000

4

1

0.20.20.20.1

0.10.10.10.1

0011

00

10

0.20.20.10.1

0.10.10.10.1

0011

0101

000

5

1

0.10.10.10.1

0.10.10.10.1

0001

00

0

0.10.10.10.1

0.10.10.10.1

0

Threshold: =0.2;learningrate: =0.1

Inputs

x1 x2

0011

0101

000

EpochDesiredoutputYd

1

Initialweights

w1 w2

1

0.30.30.30.2

0.1 0.1 0.1 0.1

0010

Actualoutput

Y

Error

e

00

11

Finalweights

w1 w20.30.30.20.3

0.1 0.1 0.10.0

0011

0101

000

2

1

0.30.30.30.2

0011

00

10

0.30.30.20.2

0.00.00.00.0

0011

0101

000

3

1

0.20.20.20.1

0.00.00.00.0

0.00.00.00.0

0010

00

11

0.20.20.10.2

0.00.00.00.1

0011

0101

000

4

1

0.20.20.20.1

0.10.10.10.1

0011

00

10

0.20.20.10.1

0.10.10.10.1

0011

0101

000

5

1

0.10.10.10.1

0.10.10.10.1

0001

00

0

0.10.10.10.1

0.10.10.10.1

0

Threshold: =0.2;learningrate: =0.1

Train a perceptron to recognize logical AND

Perceptron Example – AND

Use threshold Θ = 0.2 andlearning rate α = 0.1

Page 21: Artificial Intelligence CIS 342

Inputs

x1 x2

0011

0101

000

EpochDesiredoutputYd

1

Initialweights

w1 w2

1

0.30.30.30.2

0.1 0.1 0.1 0.1

0010

Actualoutput

Y

Error

e

00

11

Finalweights

w1 w20.30.30.20.3

0.1 0.1 0.10.0

0011

0101

000

2

1

0.30.30.30.2

0011

00

10

0.30.30.20.2

0.00.00.00.0

0011

0101

000

3

1

0.20.20.20.1

0.00.00.00.0

0.00.00.00.0

0010

00

11

0.20.20.10.2

0.00.00.00.1

0011

0101

000

4

1

0.20.20.20.1

0.10.10.10.1

0011

00

10

0.20.20.10.1

0.10.10.10.1

0011

0101

000

5

1

0.10.10.10.1

0.10.10.10.1

0001

00

0

0.10.10.10.1

0.10.10.10.1

0

Threshold: =0.2;learningrate: =0.1

Inputs

x1 x2

0011

0101

000

EpochDesiredoutputYd

1

Initialweights

w1 w2

1

0.30.30.30.2

0.1 0.1 0.1 0.1

0010

Actualoutput

Y

Error

e

00

11

Finalweights

w1 w20.30.30.20.3

0.1 0.1 0.10.0

0011

0101

000

2

1

0.30.30.30.2

0011

00

10

0.30.30.20.2

0.00.00.00.0

0011

0101

000

3

1

0.20.20.20.1

0.00.00.00.0

0.00.00.00.0

0010

00

11

0.20.20.10.2

0.00.00.00.1

0011

0101

000

4

1

0.20.20.20.1

0.10.10.10.1

0011

00

10

0.20.20.10.1

0.10.10.10.1

0011

0101

000

5

1

0.10.10.10.1

0.10.10.10.1

0001

00

0

0.10.10.10.1

0.10.10.10.1

0

Threshold: =0.2;learningrate: =0.1

Repeat until convergence– i.e. final weights do not change and no

error

Perceptron Example – AND

Use threshold Θ = 0.2 andlearning rate α = 0.1

Page 22: Artificial Intelligence CIS 342

Two-dimensional plotof logical AND operation:

A single perceptron canbe trained to recognizeany linear separable function – Can we train a perceptron to

recognize logical OR?– How about logical exclusive-OR (i.e. XOR)?

Perceptron Example – AND

x1

x2

1

1

x1

x2

1

1

(b) OR (x1 x2)

x1

x2

1

1

(c) Exclusive-OR(x1 x2)

00 0

Page 23: Artificial Intelligence CIS 342

Two-dimensional plots of logical OR and XOR:

Perceptron – OR and XOR

x1

x2

1

1

x1

x2

1

1

(b) OR (x1 x2)

x1

x2

1

1

(c) Exclusive-OR(x1 x2)

00 0

Page 24: Artificial Intelligence CIS 342

Modify your code to:– Calculate the error at each step– Modify weights, if necessary

i.e. if error is non-zero

– Loop until all error values are zero for a full epoch

Modify your code to learn to recognize the logical OR operation– Try to recognize the XOR

operation....

Perceptron Coding Exercise

Page 25: Artificial Intelligence CIS 342

InputLayer OutputLayer

MiddleLayer

Multilayer neural networks consist of:– An input layer of source neurons– One or more hidden layers of

computational neurons– An output layer of more

computational neurons

Input signals are propagated in alayer-by-layer feedforward manner

Multilayer Neural Networks

Page 26: Artificial Intelligence CIS 342

Multilayer Neural Networks

InputLayer OutputLayer

MiddleLayer

I n

p u

t

S i

g

n a

l s

O u

t p

u t

S

i g

n

a l

s

Page 27: Artificial Intelligence CIS 342

Multilayer Neural Networks

Inputlayer

Firsthiddenlayer

Secondhiddenlayer

Outputlayer

I n

p u

t

S i

g n

I

n p

u t

S

i g

n

a l

sa l

s

O u

t p

u t

S

i g

p

u t

S

i g

n

a l

sn

a l

s

Page 28: Artificial Intelligence CIS 342

Multilayer Neural Networks

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Inputsignals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

XINPUT = x1 XH = x1w11 + x2w21 + ... + xiwi1 + ... + xnwn1

XOUTPUT = yH1w11 + yH2w21 + ... + yHjwj1 + ... + yHmwm1

Page 29: Artificial Intelligence CIS 342

y55

x1 31

x2

Inputlayer

Outputlayer

Hiddenlayer

42

3w13

w24

w23

w24

w35

w45

4

5

1

1

1

Three-layer network:

Multilayer Neural Networks

w14

Page 30: Artificial Intelligence CIS 342

Commercial-quality neural networks often incorporate 4 or more layers– Each layer consists of

about 10-1000 individual neurons

Experimental and research-based neural networks often use 5 or 6 (or more) layers– Overall, millions of individual neurons may

be used

Multilayer Neural Networks

Page 31: Artificial Intelligence CIS 342

A back-propagation neural network is a multilayer neural network that propagates error backwards through the network as it learns– Weights are modified based on the

calculated error

– Training is complete when the error isbelow a specified threshold

e.g. less than 0.001

Back-Propagation NNs

Page 32: Artificial Intelligence CIS 342

Back-Propagation NNs

Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Inputsignals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

Page 33: Artificial Intelligence CIS 342

y55

x1 31

x2

Inputlayer

Outputlayer

Hiddenlayer

42

3w13

w24

w23

w24

w35

w45

4

5

1

1

1

Back-Propagation NNs

Write code for the three-layer neural network below

Use the sigmoid activation function; andapply Θ by connecting fixed input -1 to weight Θ

w14

Page 34: Artificial Intelligence CIS 342

0 50 100 150 200

10 1

Epoch

Sum-Squared Network Error for 224 Epochs

100

10-1

10-2

10 -3

10 -4

Su

m-S

qu

ared

Err

or

Start withrandom weights– Repeat until

the sum of thesquared errorsis below 0.001

– Depending oninitial weights,final convergedresults may vary

Back-Propagation NNs

Page 35: Artificial Intelligence CIS 342

After 224 epochs (896 individual iterations),the neural network has been trained successfully:

Back-Propagation NNs

Inputs

x1 x2

1010

1100

011

Desiredoutput

yd

0

0.0155

Actualoutput

y5 e

Sum ofsquarederrors

0.98490.98490.0175

0.0010

Page 36: Artificial Intelligence CIS 342

y55

x1 31

x2 42

+1.0

1

1

1+1.0

+1.0

+1.0

+1.5

+1.0

+0.5

+0.5 2.0

No longer limited to linearly separable functions

Another solution:

– Isolate neuron 3, then neuron 4....

Back-Propagation NNs

Page 37: Artificial Intelligence CIS 342

Combine linearly separable functions of neurons 3 and 4:

Back-Propagation NNs

x1

x2

1

(a)

1

x2

1

1

(b)

00

x1 +x2 – 1.5 =0 x1 +x2 – 0.5 =0

x1 x1

x2

1

1

(c)

0

x1

x2

1

(a)

1

x2

1

1

(b)

00

x1 +x2 – 1.5 =0 x1 +x2 – 0.5 =0

x1 x1

x2

1

1

(c)

0

x1

x2

1

(a)

1

x2

1

1

(b)

00

x1 +x2 – 1.5 =0 x1 +x2 – 0.5 =0

x1 x1

x2

1

1

(c)

0

Page 38: Artificial Intelligence CIS 342

Handwriting recognition

Using Neural Networks

Inputlayer

Firsthiddenlayer

Secondhiddenlayer

Outputlayer

40

1

0

0

0100 => 4

0101 => 50110 => 60111 => 7 etc.

4A

Page 39: Artificial Intelligence CIS 342

Advantages of neural networks:– Given a training dataset, neural networks

learn– Powerful classification and pattern

matching applications

Drawbacks of neural networks:– Solution is a “black box”– Computationally intensive

Using Neural Networks