09 Artificial Neural Networks and Classification

download 09 Artificial Neural Networks and Classification

of 43

Transcript of 09 Artificial Neural Networks and Classification

  • 8/18/2019 09 Artificial Neural Networks and Classification

    1/43

     

    Artifcial Neural Networks andClassifcation

     An artifcial neural network is a simple

    brain-like device that can learn byadjusting connections between its

    neurons

  • 8/18/2019 09 Artificial Neural Networks and Classification

    2/43

     

     The brain as a computer

  • 8/18/2019 09 Artificial Neural Networks and Classification

    3/43

     

     The brain’s architecture

    Human (and animal) brains have a ‘computer’

    architecture which consists o a comple! web o about "#"" hi$hl% inter&connected

    processin$ units called neurons

    'rocessin$ involves si$nals bein$ sent rom neuron to neuron

    b% complicated electrochemical reaction in a hi$hl% parallel manner

  • 8/18/2019 09 Artificial Neural Networks and Classification

    4/43

     

     The neuron A neuron is nerve cell consistin$ o

    a cell bod% (soma) containin$ a nucleus’

    branchin$ out rom the bod%  a number o fbres called dendrites a sin$le lon$ fbre called the axon

    a centimetre or lon$er’

     The a!on branches and connects to thedendrites o other neurons

    the connectin$ unction is called the synapse each neuron connects to between a do*en and "##+###

    other neurons’

  • 8/18/2019 09 Artificial Neural Networks and Classification

    5/43

     

    A real neuron

  • 8/18/2019 09 Artificial Neural Networks and Classification

    6/43

     

    ,i$nal propa$ation

    Chemical transmitter substances are releasedrom the s%napses and enter the dendrites’

     This raises or lowers the electrical potential o thecell bod%

    s%napses that raise potential are called excitatory  those that lower it are inhibitory  

    -hen a threshold is reached+ an electrical pulse+the action potential+ is sent down the a!on (fring)  This spreads into the a!on’s branches reachin$

    s%napses and releasin$ transmitters into cell bodieso other neurons

  • 8/18/2019 09 Artificial Neural Networks and Classification

    7/43

     

    .rain versus computer ,tora$e capacit%

    brain has more neurons than computer has bits

    ,peed

    brain is much slower than a computer a neuron has frin$ speed o "#&/ secs compared to computer switchin$

    speed o "#&"" secs

    .rain relies on massive parallelism or perormance %ou can reco$nise %our mother in #" secs

     The brain is more suited to intelli$ence processin$ and learnin$ 0t is $ood at ormin$ associations

    this seems to be the basis o learnin$

    0s more ault tolerant  neurons die all the time and computation continues

     Task perormance e!hibits graceul degradation in contrast to brittleness o computers

  • 8/18/2019 09 Artificial Neural Networks and Classification

    8/43

     

    Artifcial neural networks

  • 8/18/2019 09 Artificial Neural Networks and Classification

    9/43

     

    -hat is an artifcial neuralnetwork1 An artifcial neural network (ann) is a $rossl%

    oversimplifed version o the brain’s architecture

    0t has ar ewer ’neurons’ several hundred or thousand

    0t has much simpler internal structure

     The frin$ mechanism is less comple!

     The si$nals consist o real numbers passed romone neuron to another

  • 8/18/2019 09 Artificial Neural Networks and Classification

    10/43

     

    How does a network behave1 2ost anns can be re$arded as input&output

    devices

    numerical input is propa$ated throu$h the network romneuron to neuron till it reaches the output

     The connections between neurons havenumerical weights which are used to combinethe si$nals reachin$ a neuron

    3earnin$ involves establishin$ the wei$ht values(strengths) to achieve a particular $oal 0n theor% the stren$ths could be pro$rammed rather than

    learnt but or the most part this would be impossibl% tedious

  • 8/18/2019 09 Artificial Neural Networks and Classification

    11/43

     

    4esi$nin$ a network Creatin$ an ann re5uires the ollowin$ to be

    specifed

    Network topology  the number o units the pattern o interconnectivit% amon$st them

    the mathematical t%pe o the wei$hts

    Transer unction   This combines the inputs impin$in$ on the unit and produces

    the unit activation level which then becomes the output si$nal Representation or e!amples earning law

     This states how wei$hts are to be modifed to achieve thelearnin$ $oal

  • 8/18/2019 09 Artificial Neural Networks and Classification

    12/43

     

    Network topolo$% & neurons and

    la%ers ,pecifes how man% nodes (neurons) there are and

    how the% are connected in a ully connected network  each node is connected to ever%

    other 6ten networks are or$anised in layers (slabs) with no

    connections between nodes in a la%er & onl% across  The frst la%er is the input layer + the last+ the output layer 3a%ers between the input and output la%ers are called hidden

     The input units t%picall% do not carr% out internalcomputation+ ie do not have transer unctions the% merel% pass on their si$nal values

     The output units send their si$nal directl% to theoutside world

  • 8/18/2019 09 Artificial Neural Networks and Classification

    13/43

     

    Network topolo$% & wei$hts -ei$hts are usuall% real&valued

    At the start o learnin$+ their values are oten set

    randoml% 0 there is a connection rom a to b then a has

    in7uence over the activation value o b !xcitatory in"uence 

    hi$h activation in unit a contributes to hi$h activation in unit b

      is modelled b% a positive wei$ht  #nhibitory in"uence 

    hi$h activation in unit a contributes to low activation in unit b

    is modelled b% a ne$ative wei$ht 

  • 8/18/2019 09 Artificial Neural Networks and Classification

    14/43

     

    Network topolo$% & 7ow o

    computation Althou$h connections are uni&directional+ some

    networks have pairs o units connected in both 

    directions  there is a connection rom unit a to unit b and one

    back rom unit b to unit a 

    Networks in which there is no loopin$ back oconnections are called eed-orward si$nals are 8ed orward8 rom input throu$h to output 

    Networks in which outputs are eventuall% edback into the network as inputs are calledrecurrent 

  • 8/18/2019 09 Artificial Neural Networks and Classification

    15/43

     

    9!amples o eed&orwardtopolo$ies

    6-node input layer 

    2-node output layer 

    Single layer network 

    4-node input layer 

    4-node hidden layer 1-node output layer  

    Two layer network with 1 hidden layer 

  • 8/18/2019 09 Artificial Neural Networks and Classification

    16/43

     

     The transer unction &

    combinin$ input si$nals  The input si$nals to a neuron must be combined into

    a sin$le value+ the activation level to be output

    :suall% this transer takes place in two sta$es frst the inputs are combined

    and then passed throu$h another unction to produce the

    output

     The most common method o combination is the

    weighted sum

    1 1  ...

    n n sum w x w x= + +

    Here x i is the si$nal and wi is the wei$ht on

    connection # and n is the number o input si$nals

  • 8/18/2019 09 Artificial Neural Networks and Classification

    17/43

     

     The transer unction & the

    activation level  The wei$hted sum is passed throu$h an activation

    unction to produce the output si$nal (activation level) %′  Commonl% used unctions are

    inear    The output is ust the wei$hted sum

    inear threshold ($tep unction)  The wei$hted sum is thresholded at a value c i it is less than

    c+ then y ′  % &+ otherwise y ′  % '

    $igmoid response (logistic) unction  a continuous version o the step unction which produces

    $raceul de$radation around the 8step8 at c

    )(

    1

    1c sum

    e

      y−−

    +=′

  • 8/18/2019 09 Artificial Neural Networks and Classification

    18/43

     

    Activation unction$raphs

    c0

    1

    ,i$moid

    c0

    1

    ,tep

    c

    0

    3inear

  • 8/18/2019 09 Artificial Neural Networks and Classification

    19/43

     

    9!ample

    w" ; #/

    w

  • 8/18/2019 09 Artificial Neural Networks and Classification

    20/43

     

    3earnin$ with Anns

  • 8/18/2019 09 Artificial Neural Networks and Classification

    21/43

     

    -hat tasks can a network learn1 Networks can be trained or the ollowin$ tasks

    2lassifcation 3attern association

    e$ 9n$lish verbs mapped to their past tense

    2ontent addressable4associative memory  e$ can recall>restore whole ima$e when provided with a part o

    it 

     These all involve mappin$s  The mappin$ o input to output is determined b% the settin$s

    o all the wei$hts in the network (the weight vector ) ? this iswhat is learnt

     The network node conf$uration to$ether with the wei$htvector is the knowled$e structure

  • 8/18/2019 09 Artificial Neural Networks and Classification

    22/43

     

    3earnin$ laws

    3earnin$ provides a means o fndin$ the wei$htsettin$s to implement a mappin$  This is onl% possible i the network is capable o

    representin$ the mappin$  The more comple! the mappin$+ the lar$er the network

    that will be re5uired includin$ a $reater number ohidden la%ers

    0nitiall%+ wei$hts are set at random and altered inresponse to the trainin$ data

    A re$ime or wei$ht alteration to achieve there5uired mappin$ is called a learning law 

    9ven i a network can represent a mappin$+ aparticular learnin$ law ma% not be able to learn it

  • 8/18/2019 09 Artificial Neural Networks and Classification

    23/43

     

    @epresentation o trainin$e!amples :nlike decision trees which handle both

    discrete and continuous (numeric) attributes+

    anns can handle onl% the latter All discrete attributes must be converted

    (encoded) to be numeric  This also applies to the class

    ,everal wa%s are available and the choiceaects the success o learnin$

  • 8/18/2019 09 Artificial Neural Networks and Classification

    24/43

     

    4escription attributes 0t is desirable or all attributes to have values in

    the same ran$e  This is usuall% taken to be # to "

    Achieved or numeric attributes usin$normalisation

    value →   (value - min value) 4 (max value - min value)

    Bor discrete attributes can use

    '-out-o-N encoding (distributed) N binar% (#&") units used to represent the N values o the

    attribute+ one or each

    local encoding values mapped to numbers in ran$e # to "

    more suited to ordered values

  • 8/18/2019 09 Artificial Neural Networks and Classification

    25/43

     

    Class attribute

    "&out&o&N or local encodin$ can be used or the class

     The network output ater learnin$ is usuall% onl%

    appro!imate e$ in a binar% class problem with classes represented b% # and "+the network mi$ht output #= and this would be taken as ‘"’

    :sin$ "&out&o&N encodin$ allows or a probabilisticinterpretation+ e$

    classes or car domain unacc+ acc+ $ood+ v$ood

    can be represented with our binar% units

    e$ acc → (#+ "+ #+ #)

    6utput o (#

  • 8/18/2019 09 Artificial Neural Networks and Classification

    26/43

     

    Network conf$uration

    9ncodin$ o trainin$ e!amples aects networksi*e

    0nput la%er will have one unit or each numeric attribute

    one or each locall% encoded discrete attribute

    " or each binar% discrete attribute

    k or each distributed encodin$ o a discrete attributewhere the attribute has kE< values

    :suall% have a small number o hidden la%ers(one or two)

  • 8/18/2019 09 Artificial Neural Networks and Classification

    27/43

     

    '%ramid structure

    Hidden la%ers are used to reduce thedimensionalit% o the input

    A network has a pyramid structure i the frst hidden la%er ewer nodes than the input la%er

    each hidden la%er has less than its predecessor

    the output la%er has least

     The p%ramid structure acilitates learnin$ 0n classifcation each hidden la%er appears to partiall%

    classi% the e!amples until the actual classes arereached in the output la%er

  • 8/18/2019 09 Artificial Neural Networks and Classification

    28/43

     

     The learnin$ process

    Classifcation learnin$ uses a eedback  mechanism

    An e!ample is ed throu$h the network usin$the e!istin$ wei$hts

     The output value is 5F the correct  output value+ie the class in the e!ample+ is T  (tar$et)

    0 5 ≠ T + some or all o the wei$hts are chan$edsli$htl%

     The e!tent o the chan$e usuall% depends onT &5+ called the error 

  • 8/18/2019 09 Artificial Neural Networks and Classification

    29/43

     

     The delta rule

    A wei$ht+ wi + on a connection carr%in$ si$nal+

     x i + can be modifed b% addin$ an amount ∆wi 

     proportional to the error∆wi ; η  (T-5) x i 

    where η  is the learning rate

    η  is a positive constant usuall% set at about #"

    and $raduall% decreased durin$ learnin$  The update ormula or wi is then

    wi  ← wi G ∆wi 

  • 8/18/2019 09 Artificial Neural Networks and Classification

    30/43

     

     Trainin$ epochs

    Bor each e!ample in the trainin$ set the description attribute values are ed as input to

    the network and propa$ated throu$h to the output each wei$ht is updated

     This constitutes one epoch or cycle o learnin$

     The process is repeated till it is decided to stop 2an% thousands o epochs ma% be necessar%

     The fnal set o wei$hts represent the learnedmappin$

  • 8/18/2019 09 Artificial Neural Networks and Classification

    31/43

     

    -orked e!ample & $ol domain Conversion o attributes

     Attribute Values

    Outlook sunny, overcast, rain

    Temperature -0 to 10 !

    "umidity lo#, normal, hi$h,%indy

    !lass

    true, &alse

    yes, no

     Attribute Values

    'unnyOvercast(ain

    0, 10, 10, 1

    Temperature 0 to 1 T ←  (T+50)!00)o# *ormal"i$h%indy

    0, 10, 10, 10, 1

    +lay $ol& 1, 0

  • 8/18/2019 09 Artificial Neural Networks and Classification

    32/43

     

    Network conf$uration

    :se a sin$le la%er network (no hidden units) with step unctionto illustrate the delta rule

    0nitialise wei$hts as shown 

    ,et η  % &*'

    ,unn%

    6vercast

    @ain

     Temperature3owNormal

    Hi$h

    -ind%

    w&%&*+

    (bias)

    w' % -&*/

    w6 % -&*0

    w+ % &*6

    w0 % &*+

    w/ % &*'

    w7 % -&*'

    w8 % -&*6w % &*0

    -1w1w!w"

    w#w5w$ w% w&

  • 8/18/2019 09 Artificial Neural Networks and Classification

    33/43

     

    Beedin$ a trainin$e!ample Birst e!ample is

    (sunn%+

  • 8/18/2019 09 Artificial Neural Networks and Classification

    34/43

     

     The backpropa$ation al$orithm

  • 8/18/2019 09 Artificial Neural Networks and Classification

    35/43

     

    3earnin$ in multi&la%erednetworks Networks with one or more hidden la%ers are

    necessar% to represent comple! mappin$s

    0n such a network the basic delta learnin$ law isinsuKcient 0t onl% defnes how to update wei$hts in output units

    (uses T&6)

     To update hidden node wei$hts+ we have to defne

    their  error  This is achieved b% the 9ackpropagation al$orithm

  • 8/18/2019 09 Artificial Neural Networks and Classification

    36/43

     

     The .ackpropa$ationprocess 0nputs are ed throu$h the network in the usual

    wa% this is the orward pass

    6utput la%er wei$hts are adusted based onerrors

    L then wei$hts in the previous la%er are adustedL

    L and so on back to the frst la%er this is the backwards pass (or backpropagation)

    9rrors determined in a la%er are used to determinethose in the previous la%er 

  • 8/18/2019 09 Artificial Neural Networks and Classification

    37/43

     

    0llustratin$ the errorcontribution A hidden node is partiall% ‘credited’ or errors

    in the ne!t la%er

    these errors are created in the orward passerror '

    error 6

    error +

    error k 

    w'

    wk 

    error:contribution % w'  error ' . ; . wk  error k  

    '

  • 8/18/2019 09 Artificial Neural Networks and Classification

    38/43

     

     The backpropa$ational$orithm

    A backpropagation network  is  a multi&la%ered eed&orward network

    usin$ the si$moid response activation unction

    9ackpropagation algorithm" 0nitialise all network wei$hts to small random numbers

    (between #= and ##=)

  • 8/18/2019 09 Artificial Neural Networks and Classification

    39/43

     

     Termination conditions

    2an% thousands o iterations (epochs or c%cles)ma% be necessar% to learn a classifcation mappin$  The more comple! the mappin$ to be learnt+ the more

    c%cles will be re5uired

    ,everal termination conditions are used stop ater a $iven number o epochs

    stop when the error on the trainin$ e!amples (or on aseparate validation set) alls below some a$reed level

    ,toppin$ too soon results in underftting+ too latein overftting

  • 8/18/2019 09 Artificial Neural Networks and Classification

    40/43

     

    .ackpropa$ation as asearch 3earnin$ is a search or a network wei$ht

    vector to implement the re5uired mappin$

     The search is hill&climbin$ or ratherdescending called steepest gradient descent   The heuristic used is the total o (T&6)

  • 8/18/2019 09 Artificial Neural Networks and Classification

    41/43

     

    'roblems with the search 

     The si*e o step is controlled b% the learnin$ rate parameter   This must be tuned or individual problems

    0 the step is too lar$e search becomes ineKcient 

     The error surace tends to have e!tensive 7at areas trou$hs with ver% little slope

    0t can be diKcult to reduce error in such re$ions 

    -ei$hts have to move lar$e distances and it can be hard todetermine the ri$ht direction

    Hi$h numerical accurac% is re5uired+ e$ /

  • 8/18/2019 09 Artificial Neural Networks and Classification

    42/43

     

     The trained network

    Ater learnin$+ .ackpropa$ation ma% be usedas a classifer

    4escriptions o new e!amples are ed into thenetwork and the class is read rom the output la%er

    Bor "&out&o&N output representations+ e!act valueso # and " will not usuall% be obtained

    $ensitivity analysis (usin$ test data)

    determines which attributes are mostimportant or classifcation An attribute is re$arded as important i small

    chan$es in its value aect the classifcation

  • 8/18/2019 09 Artificial Neural Networks and Classification

    43/43

    .ackpropa$ation versus04/

     These two al$orithms are the $iants oclassifcation learnin$

    -hich is better1

    the ur% is still out  There are maor dierences

    04/ avours discrete attributes+ .ackprop avourscontinuous (but each handles both t%pes)

    .ackprop handles noise well .% usin$ prunin$+ so does04/

    .ackprop is much slower than 04/ and ma% $et stuck 04/ tells us which attributes are important .ackprop does

    this+ (to some e!tent) with sensitivit% anal%sis .ackprop’s learned knowled$e structure (wei$ht vector) is

    not understandable whereas an 04/ tree can becomprehended (althou$h this is diKcult i the tree is lar$e)