Artificial Intelligence CIS 342

Artificial Intelligence

CIS 342

The College of Saint RoseDavid Goldschmidt, Ph.D.

Machine learning involves adaptive mechanisms that enable computers to:– Learn from experience– Learn by example– Learn by analogy

Learning capabilities improve the performanceof intelligent systems over time

Machine Learning

How do brains work?– How do human brains differ from that

of other animals?

Can we base models ofartificial intelligence onthe structure and innerworkings of the brain?

The Brain

The human brain consists of:– Approximately 10 billion neurons – …and 60 trillion connections

The brain is a highly complex, nonlinear,parallel information-processing system– By firing neurons simultaneously, the brain

performs faster than the fastest computers in existence today

The Brain

Soma Soma

Synapse

Synapse

Dendrites

Axon

Synapse

Dendrites

Axon

Building blocks of the human brain:

The Brain

An individual neuron has a very simple structure– Cell body is called a soma– Small connective fibers are called dendrites– Single long fibers are called axons

An army of such elements constitutes tremendous processing power

The Brain

An artificial neural network consists of a numberof very simple processors called neurons

– Neurons are connectedby weighted links

– The links pass signals fromone neuron to another basedon predefined thresholds

Artificial Neural Networks

An individual neuron (McCulloch & Pitts, 1943):– Computes the weighted sum of the input

signals – Compares the result with a threshold value,

– If the net input is less than the threshold,

the neuron output is –1 (or 0)– Otherwise, the neuron becomes activated

and its output is +1



Neuron Y

InputSignals

x1

x2

xn

OutputSignals

Y

Y

Y

w2

w1

wn

Weights

X = x1w1 + x2w2 + ... + xnwn

threshold

Individual neurons adhere to an activation function, which determines whether they propagate their signal (i.e. activate) or not:

Sign Function

Activation Functions

n

iiiwxX

1

1

1

X

XY

if,

if,


Step function Sign function

+1

-1

0

+1

-1

0X

Y

X

Y

1 1

-1

0 X

Y

Sigmoid function

-1

0 X

Y

Linear function

0if,0

0if,1

X

XYstep

0if,1

0if,1

X

XYsign X

sigmoid

eY

1

1XYlinear

hard limit functions

The step, sign, and sigmoid activation functionsare also often called hard limit functions

We use such functions indecision-making neural networks– Support classification and

other pattern recognition tasks


Write functions or methods for theactivation functions on the previous slide

Can an individual neuron learn?– In 1958, Frank Rosenblatt introduced a

training algorithm that provided thefirst procedure for training asingle-node neural network

– Rosenblatt’s perceptron model consistsof a single neuron with adjustablesynaptic weights, followed by a hard limiter

Perceptrons

Perceptrons

Threshold

Inputs

x1

x2

Output

Y

HardLimiter

w2

w1

LinearCombiner

X = x1w1 + x2w2

Y = Ystep

Write code for a single two-input neuron – (see below)

Set w1, w2, and Θ through trial and errorto obtain a logical AND of inputs x1 and x2

A perceptron:– Classifies inputs x1, x2, ..., xn

into one of two distinctclasses A1 and A2

– Forms a linearly separablefunction defined by:

Perceptrons

x1

x2

Class A2

Class A1

1

2

x1w1 +x2w2 =0

(a) Two-inputperceptron. (b) Three-inputperceptron.

x2

x1

x3x1w1 +x2w2 +x3w3 =0

12

01

n

iiiwx

Perceptron with threeinputs x1, x2, and x3 classifies its inputsinto two distinctsets A1 and A2

Perceptrons

x1

x2

Class A2

Class A1

1

2

x1w1 +x2w2 =0

(a) Two-inputperceptron. (b) Three-inputperceptron.

x2

x1

x3x1w1 +x2w2 +x3w3 =0

12

01

n

iiiwx

How does a perceptron learn?– A perceptron has initial (often random)

weights typically in the range [-0.5, 0.5]– Apply an established training dataset – Calculate the error as

expected output minus actual output:

error e = Yexpected – Yactual

– Adjust the weights to reduce the error

Perceptrons

How do we adjust a perceptron’sweights to produce Yexpected?– If e is positive, we need to increase Yactual

(and vice versa)

– Use this formula:, where

and

α is the learning rate (between 0 and 1) e is the calculated error

Perceptrons

wi = wi + Δwi Δwi = α x xi x e

Train a perceptron to recognize logical AND

Perceptron Example – AND

Use threshold Θ = 0.2 andlearning rate α = 0.1

Inputs

x1 x2

0011

0101

000

EpochDesiredoutputYd

1

Initialweights

w1 w2

1

0.30.30.30.2

0.1 0.1 0.1 0.1

0010

Actualoutput

Y

Error

e

00

11

Finalweights

w1 w20.30.30.20.3

0.1 0.1 0.10.0

0011

0101

000

2

1

0.30.30.30.2

0011

00

10

0.30.30.20.2

0.00.00.00.0

0011

0101

000

3

1

0.20.20.20.1

0.00.00.00.0

0.00.00.00.0

0010

00

11

0.20.20.10.2

0.00.00.00.1

0011

0101

000

4

1

0.20.20.20.1

0.10.10.10.1

0011

00

10

0.20.20.10.1

0.10.10.10.1

0011

0101

000

5

1

0.10.10.10.1

0.10.10.10.1

0001

00

0

0.10.10.10.1

0.10.10.10.1

0

Threshold: =0.2;learningrate: =0.1

Inputs

x1 x2

0011

0101

000


1

Initialweights

w1 w2

1

0.30.30.30.2

0.1 0.1 0.1 0.1

0010

Actualoutput

Y

Error

e

00

11

Finalweights

w1 w20.30.30.20.3

0.1 0.1 0.10.0

0011

0101

000

2

1

0.30.30.30.2

0011

00

10

0.30.30.20.2

0.00.00.00.0

0011

0101

000

3

1

0.20.20.20.1

0.00.00.00.0

0.00.00.00.0

0010

00

11

0.20.20.10.2

0.00.00.00.1

0011

0101

000

4

1

0.20.20.20.1

0.10.10.10.1

0011

00

10

0.20.20.10.1

0.10.10.10.1

0011

0101

000

5

1

0.10.10.10.1

0.10.10.10.1

0001

00

0

0.10.10.10.1

0.10.10.10.1

0


Inputs

x1 x2

0011

0101

000


1

Initialweights

w1 w2

1

0.30.30.30.2

0.1 0.1 0.1 0.1

0010

Actualoutput

Y

Error

e

00

11

Finalweights

w1 w20.30.30.20.3

0.1 0.1 0.10.0

0011

0101

000

2

1

0.30.30.30.2

0011

00

10

0.30.30.20.2

0.00.00.00.0

0011

0101

000

3

1

0.20.20.20.1

0.00.00.00.0

0.00.00.00.0

0010

00

11

0.20.20.10.2

0.00.00.00.1

0011

0101

000

4

1

0.20.20.20.1

0.10.10.10.1

0011

00

10

0.20.20.10.1

0.10.10.10.1

0011

0101

000

5

1

0.10.10.10.1

0.10.10.10.1

0001

00

0

0.10.10.10.1

0.10.10.10.1

0


Inputs

x1 x2

0011

0101

000


1

Initialweights

w1 w2

1

0.30.30.30.2

0.1 0.1 0.1 0.1

0010

Actualoutput

Y

Error

e

00

11

Finalweights

w1 w20.30.30.20.3

0.1 0.1 0.10.0

0011

0101

000

2

1

0.30.30.30.2

0011

00

10

0.30.30.20.2

0.00.00.00.0

0011

0101

000

3

1

0.20.20.20.1

0.00.00.00.0

0.00.00.00.0

0010

00

11

0.20.20.10.2

0.00.00.00.1

0011

0101

000

4

1

0.20.20.20.1

0.10.10.10.1

0011

00

10

0.20.20.10.1

0.10.10.10.1

0011

0101

000

5

1

0.10.10.10.1

0.10.10.10.1

0001

00

0

0.10.10.10.1

0.10.10.10.1

0


Train a perceptron to recognize logical AND



Inputs

x1 x2

0011

0101

000


1

Initialweights

w1 w2

1

0.30.30.30.2

0.1 0.1 0.1 0.1

0010

Actualoutput

Y

Error

e

00

11

Finalweights

w1 w20.30.30.20.3

0.1 0.1 0.10.0

0011

0101

000

2

1

0.30.30.30.2

0011

00

10

0.30.30.20.2

0.00.00.00.0

0011

0101

000

3

1

0.20.20.20.1

0.00.00.00.0

0.00.00.00.0

0010

00

11

0.20.20.10.2

0.00.00.00.1

0011

0101

000

4

1

0.20.20.20.1

0.10.10.10.1

0011

00

10

0.20.20.10.1

0.10.10.10.1

0011

0101

000

5

1

0.10.10.10.1

0.10.10.10.1

0001

00

0

0.10.10.10.1

0.10.10.10.1

0


Inputs

x1 x2

0011

0101

000


1

Initialweights

w1 w2

1

0.30.30.30.2

0.1 0.1 0.1 0.1

0010

Actualoutput

Y

Error

e

00

11

Finalweights

w1 w20.30.30.20.3

0.1 0.1 0.10.0

0011

0101

000

2

1

0.30.30.30.2

0011

00

10

0.30.30.20.2

0.00.00.00.0

0011

0101

000

3

1

0.20.20.20.1

0.00.00.00.0

0.00.00.00.0

0010

00

11

0.20.20.10.2

0.00.00.00.1

0011

0101

000

4

1

0.20.20.20.1

0.10.10.10.1

0011

00

10

0.20.20.10.1

0.10.10.10.1

0011

0101

000

5

1

0.10.10.10.1

0.10.10.10.1

0001

00

0

0.10.10.10.1

0.10.10.10.1

0


Repeat until convergence– i.e. final weights do not change and no

error



Two-dimensional plotof logical AND operation:

A single perceptron canbe trained to recognizeany linear separable function – Can we train a perceptron to

recognize logical OR?– How about logical exclusive-OR (i.e. XOR)?


x1

x2

1

1

x1

x2

1

1

(b) OR (x1 x2)

x1

x2

1

1

(c) Exclusive-OR(x1 x2)

00 0

Two-dimensional plots of logical OR and XOR:

Perceptron – OR and XOR

x1

x2

1

1

x1

x2

1

1

(b) OR (x1 x2)

x1

x2

1

1

(c) Exclusive-OR(x1 x2)

00 0

Modify your code to:– Calculate the error at each step– Modify weights, if necessary

i.e. if error is non-zero

– Loop until all error values are zero for a full epoch

Modify your code to learn to recognize the logical OR operation– Try to recognize the XOR

operation....

Perceptron Coding Exercise

InputLayer OutputLayer

MiddleLayer

Multilayer neural networks consist of:– An input layer of source neurons– One or more hidden layers of

computational neurons– An output layer of more

computational neurons

Input signals are propagated in alayer-by-layer feedforward manner

Multilayer Neural Networks


InputLayer OutputLayer

MiddleLayer

I n

p u

t

S i

g

n a

l s

O u

t p

u t

S

i g

n

a l

s


Inputlayer

Firsthiddenlayer

Secondhiddenlayer

Outputlayer

I n

p u

t

S i

g n

I

n p

u t

S

i g

n

a l

sa l

s

O u

t p

u t

S

i g

p

u t

S

i g

n

a l

sn

a l

s


Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Inputsignals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

XINPUT = x1 XH = x1w11 + x2w21 + ... + xiwi1 + ... + xnwn1

XOUTPUT = yH1w11 + yH2w21 + ... + yHjwj1 + ... + yHmwm1

y55

x1 31

x2

Inputlayer

Outputlayer

Hiddenlayer

42

3w13

w24

w23

w24

w35

w45

4

5

1

1

1

Three-layer network:


w14

Commercial-quality neural networks often incorporate 4 or more layers– Each layer consists of

about 10-1000 individual neurons

Experimental and research-based neural networks often use 5 or 6 (or more) layers– Overall, millions of individual neurons may

be used


A back-propagation neural network is a multilayer neural network that propagates error backwards through the network as it learns– Weights are modified based on the

calculated error

– Training is complete when the error isbelow a specified threshold

e.g. less than 0.001

Back-Propagation NNs


Inputlayer

xi

x1

x2

xn

1

2

i

n

Outputlayer

1

2

k

l

yk

y1

y2

yl

Inputsignals

Error signals

wjk

Hiddenlayer

wij

1

2

j

m

y55

x1 31

x2

Inputlayer

Outputlayer

Hiddenlayer

42

3w13

w24

w23

w24

w35

w45

4

5

1

1

1


Write code for the three-layer neural network below

Use the sigmoid activation function; andapply Θ by connecting fixed input -1 to weight Θ

w14

0 50 100 150 200

10 1

Epoch

Sum-Squared Network Error for 224 Epochs

100

10-1

10-2

10 -3

10 -4

Su

m-S

qu

ared

Err

or

Start withrandom weights– Repeat until

the sum of thesquared errorsis below 0.001

– Depending oninitial weights,final convergedresults may vary


After 224 epochs (896 individual iterations),the neural network has been trained successfully:


Inputs

x1 x2

1010

1100

011

Desiredoutput

yd

0

0.0155

Actualoutput

y5 e

Sum ofsquarederrors

0.98490.98490.0175

0.0010

y55

x1 31

x2 42

+1.0

1

1

1+1.0

+1.0

+1.0

+1.5

+1.0

+0.5

+0.5 2.0

No longer limited to linearly separable functions

Another solution:

– Isolate neuron 3, then neuron 4....


Combine linearly separable functions of neurons 3 and 4:


x1

x2

1

(a)

1

x2

1

1

(b)

00

x1 +x2 – 1.5 =0 x1 +x2 – 0.5 =0

x1 x1

x2

1

1

(c)

0

x1

x2

1

(a)

1

x2

1

1

(b)

00

x1 +x2 – 1.5 =0 x1 +x2 – 0.5 =0

x1 x1

x2

1

1

(c)

0

x1

x2

1

(a)

1

x2

1

1

(b)

00

x1 +x2 – 1.5 =0 x1 +x2 – 0.5 =0

x1 x1

x2

1

1

(c)

0

Handwriting recognition

Using Neural Networks

Inputlayer

Firsthiddenlayer

Secondhiddenlayer

Outputlayer

40

1

0

0

0100 => 4

0101 => 50110 => 60111 => 7 etc.

4A

Advantages of neural networks:– Given a training dataset, neural networks

learn– Powerful classification and pattern

matching applications

Drawbacks of neural networks:– Solution is a “black box”– Computationally intensive

Using Neural Networks

Artificial Intelligence CIS 342

Documents

Transcript of Artificial Intelligence CIS 342