Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf ·...

34
Neural Networks and Deep Learning I CSCI 5622 Fall 2015

Transcript of Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf ·...

Page 1: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

Neural  Networks  and  Deep  Learning  I  CSCI  5622  Fall  2015  

Page 2: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

A  Brief  History  Of  Machine  Learning  

ü 1962    Frank  RosenblaI,  Principles  of  Neurodynamics:  Perceptrons  and  the  Theory  of  Brain  Mechanisms  

  Perceptron  can  learn  anything  you  can  program  it  to  do.  

Page 3: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

A  Brief  History  Of  Machine  Learning  

ü 1969    Minsky  &  Papert,  Perceptrons:  An  introduc:on  to  computa:onal  geometry  

  There  are  many  things  a  perceptron  can’t  in  principle  learn  to  do  

Page 4: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

A  Brief  History  Of  Machine  Learning  

ü 1970-­‐1985    AIempts  to  develop  symbolic  rule  discovery  algorithms  

ü 1986    Rumelhart,  Hinton,  &  Williams,  Back  propaga:on  

  Overcame  many  of  the  Minsky  &  Papert  objecWons  

  Neural  nets  popular  in  cog  sci  and  AI  

circa 1990

Page 5: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

A  Brief  History  Of  Machine  Learning  

ü 1990-­‐2005    Bayesian  approaches  

•  take  the  best  ideas  from  neural  networks  –  staWsWcal  compuWng,  staWsWcal  learning  

  Support-­‐Vector  Machines  

•  convergence  proofs  (unlike  neural  nets)    A  few  old  Wmers  keep  playing  with  neural  nets  

•  Hinton,  LeCun,  Bengio  

  Neural  nets  banished  from  NIPS!  

Page 6: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

A  Brief  History  Of  Neural  Networks  

ü 2005-­‐2010    AIempts  to  resurrect  neural  nets  with  

•  unsupervised  pretraining  •  probabilisWc  neural  nets  •  alternaWve  learning  rules  

Page 7: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

A  Brief  History  Of  Neural  Networks  

ü 2010-­‐present    Most  of  the  alternaWve  techniques  discarded  in  favor  of  1980’s  style  neural  nets  with  …  •  lots  more  training  data  

•  lots  more  compuWng  cycles  

•  a  few  important  tricks  that  improve  training  and  generalizaWon  (mostly  from  Hinton)  

•  rebranding:  Deep  Learning  

Page 8: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

Search All NYTimes.com

Connect With Uson Social Media@nytimesscience onTwitter.

Science Reportersand Editors onTwitter

Like the science desk on Facebook.

Readers’ Comments

Share your thoughts.Post a Comment »Read All Comments (117) »

Brainlike Computers, Learning From Experience

Erin Lubin/The New York Times

Kwabena Boahen holding a biologically inspired processor attached to a robotic arm in a laboratory at Stanford University.

By JOHN MARKOFFPublished: December 28, 2013 117 Comments

PALO ALTO, Calif. — Computers have entered the age when they are able tolearn from their own mistakes, a development that is about to turn the digitalworld on its head.

The first commercial version of the newkind of computer chip is scheduled to bereleased in 2014. Not only can it automatetasks that now require painstakingprogramming — for example, moving arobot’s arm smoothly and efficiently — butit can also sidestep and even tolerate errors,potentially making the term “computercrash” obsolete.

The new computing approach, already inuse by some large technology companies, isbased on the biological nervous system,specifically on how neurons react to stimuli and connect with otherneurons to interpret information. It allows computers to absorbnew information while carrying out a task, and adjust what they do

based on the changing signals.

In coming years, the approach will make possible a new generation of artificial intelligence systems thatwill perform some functions that humans do with ease: see, speak, listen, navigate, manipulate andcontrol. That can hold enormous consequences for tasks like facial and speech recognition, navigationand planning, which are still in elementary stages and rely heavily on human programming.

Designers say the computing style can clear the way for robots that can safely walk and drive in thephysical world, though a thinking or conscious computer, a staple of science fiction, is still far off on the

Log In With Facebook

Hey, Stars, BeNice to theStagehands. YouMight Need aLoan.

Wolf Haters

MOST EMAILED RECOMMENDED FOR YOU

5 articles viewed recently tyranomikeAll Recommendations

Log in to see what your friends are sharing onnytimes.com. Privacy Policy | What’s This?

What’s Popular Now

1. MORE CAPABLE THAN ORDINARY SNOW CATS,WINCH CATS RELY UPON AN ANCHORED STEELCABLE FOR SUPPORT.Winch Cats Take to the Steepest Slopes

2. LETTERSWhen Sheriffs Ignore Gun Laws They Don’tLike

3. After Dry Spell, Democrats Try for a StateWin

4. Tiny G.O.P. Minority Searches for Voice inNew York City Council

5. EDITORIALNo Cheer for the Jobless

6. Florida Race for House Sets Stage for 2014

HOME PAGE TODAY'S PAPER VIDEO MOST POPULAR

ScienceWORLD U.S. N.Y. / REGION BUSINESS TECHNOLOGY SCIENCE HEALTH SPORTS OPINION ARTS STYLE TRAVEL JOBS REAL ESTATE

AUTOSENVIRONMENT SPACE & COSMOS

FACEBOOK

TWITTER

GOOGLE+

SAVE

EMAIL

SHARE

PRINT

REPRINTS

Helptyranomike...International Edition

Brainlike Computers, Learning From Experience - NYTimes.com http://www.nytimes.com/2013/12/29/science/brainlike-computers-l...

1 of 3 12/29/13, 9:56 AM

digital horizon.

“We’re moving from engineering computing systems to something that has many of the characteristicsof biological computing,” said Larry Smarr, an astrophysicist who directs the California Institute forTelecommunications and Information Technology, one of many research centers devoted to developingthese new kinds of computer circuits.

Conventional computers are limited by what they have been programmed to do. Computer visionsystems, for example, only “recognize” objects that can be identified by the statistics-orientedalgorithms programmed into them. An algorithm is like a recipe, a set of step-by-step instructions toperform a calculation.

But last year, Google researchers were able to get a machine-learning algorithm, known as a neuralnetwork, to perform an identification task without supervision. The network scanned a database of 10million images, and in doing so trained itself to recognize cats.

In June, the company said it had used those neural network techniques to develop a new search serviceto help customers find specific photos more accurately.

The new approach, used in both hardware and software, is being driven by the explosion of scientificknowledge about the brain. Kwabena Boahen, a computer scientist who leads Stanford’s Brains inSilicon research program, said that is also its limitation, as scientists are far from fully understandinghow brains function.

“We have no clue,” he said. “I’m an engineer, and I build things. There are these highfalutin theories,but give me one that will let me build something.”

Until now, the design of computers was dictated by ideas originated by the mathematician John vonNeumann about 65 years ago. Microprocessors perform operations at lightning speed, followinginstructions programmed using long strings of 1s and 0s. They generally store that informationseparately in what is known, colloquially, as memory, either in the processor itself, in adjacent storagechips or in higher capacity magnetic disk drives.

The data — for instance, temperatures for a climate model or letters for word processing — are shuttledin and out of the processor’s short-term memory while the computer carries out the programmedaction. The result is then moved to its main memory.

The new processors consist of electronic components that can be connected by wires that mimicbiological synapses. Because they are based on large groups of neuron-like elements, they are known asneuromorphic processors, a term credited to the California Institute of Technology physicist CarverMead, who pioneered the concept in the late 1980s.

They are not “programmed.” Rather the connections between the circuits are “weighted” according tocorrelations in data that the processor has already “learned.” Those weights are then altered as dataflows in to the chip, causing them to change their values and to “spike.” That generates a signal thattravels to other components and, in reaction, changes the neural network, in essence programming thenext actions much the same way that information alters human thoughts and actions.

“Instead of bringing data to computation as we do today, we can now bring computation to data,” saidDharmendra Modha, an I.B.M. computer scientist who leads the company’s cognitive computingresearch effort. “Sensors become the computer, and it opens up a new way to use computer chips thatcan be everywhere.”

The new computers, which are still based on silicon chips, will not replace today’s computers, but willaugment them, at least for now. Many computer designers see them as coprocessors, meaning they canwork in tandem with other circuits that can be embedded in smartphones and in the giant centralizedcomputers that make up the cloud. Modern computers already consist of a variety of coprocessors thatperform specialized tasks, like producing graphics on your cellphone and converting visual, audio andother data for your laptop.

One great advantage of the new approach is its ability to tolerate glitches. Traditional computers areprecise, but they cannot work around the failure of even a single transistor. With the biological designs,the algorithms are ever changing, allowing the system to continuously adapt and work around failuresto complete tasks.

Traditional computers are also remarkably energy inefficient, especially when compared to actualbrains, which the new neurons are built to mimic.

I.B.M. announced last year that it had built a supercomputer simulation of the brain that encompassedroughly 10 billion neurons — more than 10 percent of a human brain. It ran about 1,500 times moreslowly than an actual brain. Further, it required several megawatts of power, compared with just 20watts of power used by the biological brain.

Go to Your Recommendations »What’s This? | Don’t Show

Best of 2013 | The Year in FoodALSO IN T MAGAZINE »

In Lyon, artists and designers light up the cityTropical prints that transport

Brainlike Computers, Learning From Experience - NYTimes.com http://www.nytimes.com/2013/12/29/science/brainlike-computers-l...

2 of 3 12/29/13, 9:56 AM

A version of this article appears in print on December 29, 2013, on page A1 of the New York edition with the headline: Brainlike Computers,Learning From Experience.

SAVE EMAIL SHARE

117 Comments

Share your thoughts.

Newest Write a Comment

Artificial Intelligence

Computer Chips

Get Free Email Alerts on These Topics

Computers and the Internet

Robots and Robotics

Running the program, known as Compass, which attempts to simulate a brain, at the speed of a humanbrain would require a flow of electricity in a conventional computer that is equivalent to what is neededto power both San Francisco and New York, Dr. Modha said.

I.B.M. and Qualcomm, as well as the Stanford research team, have already designed neuromorphicprocessors, and Qualcomm has said that it is coming out in 2014 with a commercial version, which isexpected to be used largely for further development. Moreover, many universities are now focused onthis new style of computing. This fall the National Science Foundation financed the Center for Brains,Minds and Machines, a new research center based at the Massachusetts Institute of Technology, withHarvard and Cornell.

The largest class on campus this fall at Stanford was a graduate level machine-learning course coveringboth statistical and biological approaches, taught by the computer scientist Andrew Ng. More than 760students enrolled. “That reflects the zeitgeist,” said Terry Sejnowski, a computational neuroscientist atthe Salk Institute, who pioneered early biologically inspired algorithms. “Everyone knows there issomething big happening, and they’re trying find out what it is.”

Try unlimited access to NYTimes.com for just 99¢. SEE OPTIONS »

© 2013 The New York Times Company Site Map Privacy Your Ad Choices Advertise Terms of Sale Terms of Service Work With Us RSS Help Contact Us Site Feedback

BOOKS »

Intoxicating Prose

MAGAZINE »

The Lives They Lived

THEATER »

Heartthrobs Rule the KoreanStage

SUNDAY REVIEW »

Editorial: NoCheer for theJoblessFederal unemploymentbenefits expire whileCongress enjoys a holiday.

ARTS »

The Disrupters

SUNDAY REVIEW »

Editorial Notebook: Wolf Haters

ALL READER PICKS NYT PICKS

INSIDE NYTIMES.COM

Brainlike Computers, Learning From Experience - NYTimes.com http://www.nytimes.com/2013/12/29/science/brainlike-computers-l...

3 of 3 12/29/13, 9:56 AM

2013  

Page 9: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

11/23/2012

Page 10: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

5/23/2015

Page 11: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

Modeling  Individual  Neurons  

flow of information

Page 12: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

Modeling  Individual  Neurons  

rectified

Page 13: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

ComputaWon  With  A  Binary  Threshold  Unit  

= 1 if net > 0

Page 14: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

ComputaWon  With  A  Binary  Threshold  Unit  

0

Page 15: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

Feedforward  Architectures  

Page 16: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

Recurrent  Architectures  

Page 17: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

Let’s  Get  Serious…

ü Training  data  

ü Network  model  

ü ObjecWve  funcWon  

ü Learning  rule  

{(x1,d1),(x2,d 2 ),…,(x p ,d p )}

yα = fw (xα )

E = (dkα − yk

α )2k∑

α∑

big,  hairy,  sWnky  =>  run  away  

Δwji = −ε ∂E∂wji

Page 18: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

Linear  AcWvaWon  FuncWon  

   =>  

Linear  Regression  

Via  StochasWc  Gradient  Descent

yjα = wjixi

α

i∑

Page 19: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible
Page 20: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

Batch  Versus  Online  Training  (True  Versus  StochasWc  Gradient  Descent)

Page 21: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

Batch  Versus  Online  Training  (True  Versus  StochasWc  Gradient  Descent)

Page 22: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

Extending  LMS  To  Handle  Nonlinear  AcWvaWon  FuncWons  And  MulWlayered  Networks

Page 23: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

LogisWc  AcWvaWon  FuncWon

Page 24: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible
Page 25: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible
Page 26: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible
Page 27: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

Why  Are  NonlineariWes  Necessary?  

ü Prove  

§ A  network  with  a  linear  hidden  layer  has  no  more  funcWonality  than  a  network  with  no  hidden  layer  (i.e.,  direct  connecWons  from  input  to  output)  

§  For  example,  a  network  with  a  linear  hidden  layer  cannot  learn  XOR  

x  

y  

z  

W  

V  

Page 28: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible
Page 29: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible
Page 30: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible
Page 31: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

Changing  Loss  FuncWon  

ü squared  error        

ü cross  entropy  (=  max  likelihood)  

∂E∂yj

= d j − yj

∂E∂yj

=d j − yjy j (1− yj )

E = − d j log yj + (1− d j )log(1− yj )j∑

E = 12

d j − yj( )2j∑

Page 32: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

Changing  Loss  FuncWon  

ü Back  propagaWon  

§  logisWc  acWvaWon  funcWon  

§ weight  update  

ü     

z j = wjixii∑ yj =

11+ exp(−z j )

Δwji = εδ j xi δ j =

∂E∂yj

y j (1− yj ) for output unit

wkjδ kk∑⎛⎝⎜

⎞⎠⎟yj (1− yj ) for hidden unit

⎪⎪

⎪⎪

Page 33: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

Changing  AcWvaWon  FuncWon  1  

ü Back  propagaWon  §  sokmax  acWvaWon  funcWon  for  1-­‐of-­‐N  classificaWon  

 § weight  update          §  gradient  is  the  same  when  expressed  in  terms  of  yj  

z j = wjixii∑

Δwji = εδ j xi δ j =

∂E∂yj

y j (1− yj ) for output unit

wkjδ kk∑⎛⎝⎜

⎞⎠⎟yj (1− yj ) for hidden unit

⎪⎪

⎪⎪

yj =exp(z j )exp(zk )k∑

Page 34: Neural’Networks’and’Deep’Learning’I’jbg/teaching/CSCI_5622/14a.pdf · Neural’Networks’and’Deep’Learning’I ... In coming years, the approach will make possible

Changing  AcWvaWon  FuncWon  2  

ü Back  propagaWon  §  tanh  acWvaWon  funcWon  

 § weight  update          §  incompaWble  with  cross  entropy  loss  

z j = wjixii∑ yj = tanh(z j ) = 2logistic(z j )

Δwji = εδ j xi δ j =

∂E∂yj

(1+ yj )(1− yj ) for output unit

wkjδ kk∑⎛⎝⎜

⎞⎠⎟

(1+ yj )(1− yj ) for hidden unit

⎪⎪

⎪⎪