HC-4016, Heterogeneous Implementation of Neural Network Algorithms, by Dmitri Yudanov and Leon...
-
Upload
amd-developer-central -
Category
Technology
-
view
975 -
download
2
description
Transcript of HC-4016, Heterogeneous Implementation of Neural Network Algorithms, by Dmitri Yudanov and Leon...
HETEROGENEOUS IMPLEMENTATION OF NEURAL NETWORK ALGORITHMS
Dmitri Yudanov (AMD) Leon Reznik (RIT)
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 2
AGENDA
Neural Networks: Origin, Features, Applica?ons
Spiking Neural Networks (SNN): Simula?on Principles
SNN: Heterogeneous Implementa?on
Neural Networks: Origin Features Applica?ons
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 4
NEURAL NETWORKS: ORIGIN, FEATURES, APPLICATIONS
! From Biological to Ar?ficial Neural Networks (ANN)
! ANN Applica?ons ‒ Applica?on categories ‒ Examples
! Why ANN?
! Why Spiking Neural Network (SNN)?
OUTLINE
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 5
FROM BIOLOGICAL TO ARTIFICIAL NEURAL NETWORK (ANN)
! ANN is simplifica?on of biological neural network
! ANN consists of simple elements (neurons) analogous to the biological neurons in the brain.
! The neurons are connected by weighted links and form a network.
! The links pass signals (numbers) from one neuron to another. Neurons operate on the weighted signals and retransmit the results
! The network can learn by adjus?ng the weights (the behavior is encoded in weights).
NEURAL NETWORKS: ORIGIN, FEATURES, APPLICATIONS
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 6
ANN APPLICATION CATEGORIES NEURAL NETWORKS: ORIGIN, FEATURES, APPLICATIONS
! Based on patent and applica?on search (US Patent and Trademark Office, EU Patent Office, Google Patent Search. Conducted in 2012 by students of Machine Learning class (Dr. Leon Reznik, RIT)
0% 2% 4% 6% 8%
10% 12% 14% 16% 18%
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 7
WHY ANN? EXAMPLES NEURAL NETWORKS: ORIGIN, FEATURES, APPLICATIONS
! Recogni@on ‒ Character (e.g. mail), speech, image (e.g. image clustering), odor (e.g. locust antennal lobe), face and emo?on
! Gaming ‒ AI features in games
! Robo@cs ‒ Vision, spa?al naviga?on and planning (e.g. mental maps with place cells), posi?oning, decision making
! Control ‒ Missile guidance ‒ An?-‐lock brakes (Ford) ‒ Self-‐driving cars, UAVs
! Crime preven@on and security ‒ Bomb sniffer (JFK airport) ‒ Credit card fraud detec?on (Visa)
! Biomedical ‒ Neuroscience: Brain modeling and simula?on
‒ US BRAIN Ini?a?ve (expected 300 EB/day) ‒ EU Human brain project
‒ Neurology: (e.g. disease modeling and forecas?ng, ModelDB)
‒ Cardiology: (e.g. adap?ve biventricular pacemaker) ‒ Prosthesis: BCI neuromosphic chips
! Financial analysis ‒ Mortgage risk evalua?on (AVCO, Irvine) ‒ Currency trading (Ci?bank)
! Difficul@es ‒ Need to compute fast but problem size is large ‒ How to get the right ANN circuit for an applica?on?
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 8
WHY ANN? NEURAL NETWORKS: ORIGIN, FEATURES, APPLICATIONS
! Novel algorithms. ‒ Conven?onal algorithms performance is not sa?sfactory in numerous problems with dynamic changes (e.g. face recogni?on may fail if the view angle is different or the person is smiling).
! Learning, adaptability. ‒ Con?nuously learn from the available data and adapt to new condi?ons.
! Reliability. ‒ Performance tends to degrade gracefully under par?al damage. Parts of networks can learn to perform func?on of damaged parts. In contrast, most programs and engineered systems are brijle: if you remove some arbitrary parts, very likely the whole system ceases to func?on.
! Low power. Neuromorphic engineering ‒ Switching speed of biological neurons is less than 1KHz (CPU 3GHz)
‒ Switching energy of biological neurons ~ 1.0E-‐17 Joules/op (CPU 1.0E-‐5 joules/op)
‒ Conduc?on speed of biological neural network ~ 100 m/s
! Parallel. ‒ Brain performs massively parallel computa?ons very efficiently. Data and processing have global impact. For example, complex visual percep?on occurs within less than 100 ms, that is, 10 processing steps.
! AI. Consciousness. Intelligence. Self-‐awareness.
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 9
WHY SNN? NEURAL NETWORK CATEGORIES
! Which level of abstrac?on to choose?
! Which one is the right for the target applica?on?
! Point-‐to-‐point connected spiking neural network (SNN): ?me (spikes), polychroniza?on (memory capacity), unsupervised learning (synap?c plas?city)
NEURAL NETWORKS: ORIGIN, FEATURES, APPLICATIONS
Complexity
Biological
Le
arn
ing
Ab
ilit
y
Time Dynamics
Hopfield Recurrent
Rosenblaj ADALINE
MLP RBF LVQ Neural Gas
ASNN SOM
SNN
Spiking Neural Networks Simula?on Principles
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 11
OUTLINE
! SNN Models
! Synap?c plas?city ! Simula?on Types
‒ Time-‐driven (synchronous) simula?on ‒ Event-‐driven (asynchronous) simula?on ‒ Timed event-‐driven (hybrid) simula?on
! Numerical Integra?on Methods ‒ Euler ‒ Parker-‐Sochacki
! Summary
SPIKING NEURAL NETWORKS (SNN): SIMULATION PRINCIPLES
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 12
HETEROGENEOUS IMPLEMENTATION: SIMULATORS AND ABSTRACTION LEVEL SNN: HETEROGENEOUS IMPLEMENTATION
! Popula?on model ‒ Nengo
! Point-‐neuron network models ‒ NEST ‒ PCSIM ‒ Brian
! Compartmental neuron and membrane models ‒ NEURON ‒ GENESIS
! Reac?on-‐diffusion model of biochemical signaling pathways ‒ STEPS
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 13
SNN MODELS: TRADEOFFS SNN SIMULATION PRINCIPLES
HH
IF
! Integrate-‐and-‐Fire (IF): simple, but has poor spiking response
! Hodgkin-‐Huxley (HH): has reach response, but complex
! Izhikevich (IZ): simple, has reach response, but phenomenological
IZ
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 14
SYNAPTIC PLASTICITY SNN SIMULATION PRINCIPLES
! Long-‐term plas?city (minutes – hours) ‒ LTP (signaling pathways, post-‐ and pre-‐ synap?c ac?vity correla?on)
‒ LTD (strong or persistent weak s?mula?on, inac?vity, drugs)
! STDP ! Synapse: how it works
‒ Spikes ! vesicles ! fusing ! transmijer crossing the clep ! binding
‒ Synap?c strength ! PSP strength
! Synap?c strength: ‒ Transmijer release volume ‒ Connec?ons: number, size ‒ Channels, receptors: density, type, conductance
! Short-‐term plas?city (milliseconds – minutes) ‒ Facilita?on (spiking rate ! presynap?c 𝐶𝑎↑2+ ! fusing rate) ‒ Fa?gue (transmijer release vs. recycle rate ! deple?on of vesicles)
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 15
! Events aligned to ?me grid ‒ Can update all neurons at the same ?me
‒ Good for parallel implementa?on
! Time quan?za?on error ‒ Delayed or missing events ‒ Can be controlled by size of dt: the smaller the size the smaller the error, but the more computa?on per unit ?me
TIME-‐DRIVEN (SYNCHRONOUS) SIMULATION SNN SIMULATION PRINCIPLES
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 16
! Events are unique in ?me: ‒ A single event can change the state of the whole system
‒ Have to update neurons sequen?ally in the order of events
‒ Minimum transmission latency is unknown
‒ Assumes analy?cal solu?on for the model equa?ons
‒ … or ?med event-‐driven update
! Time quan?za?on error ‒ No error caused by simula?on type ‒ Bejer event accuracy ‒ Good for STDP
EVENT-‐DRIVEN (ASYNCHRONOUS) SIMULATION SNN SIMULATION PRINCIPLES
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 17
! Events are unique in ?me: ‒ A single event can change the state of the whole system, but not within the minimum transmission delay
‒ Time grid: dt is equal to the minimum delay ‒ Update all neurons at the same ?me every dt increment
‒ Also between dt increments update every neuron in the order of events it receives within the increment.
‒ Good for parallel implementa?on, but there is computa?on divergence across neurons.
! Time quan?za?on error ‒ No error caused by simula?on type ‒ Bejer event accuracy ‒ Good for STDP
TIMED EVENT-‐DRIVEN (HYBRID) SIMULATION SNN SIMULATION PRINCIPLES
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 18
NUMERICAL INTEGRATION METHODS SNN SIMULATION PRINCIPLES
! Mo@va@on. Need to solve ini?al value problem (IVP)
! Euler. Compute next y based on tangent to current y.
! Modified Euler. Predict with Euler, correct with average slope.
! Runge-‐KuXa (4th Order). Evaluate and average.
! Bulirsch–Stoer ‒ Uses Modified midpoint method with evalua?on and error tolerance check using extrapola?on with ra?onal func?ons. Provides adap?ve order. Generally more suited for smooth func?ons.
! Parker-‐Sochacki ‒ Uses expression of IVP in terms of power series. Provides adap?ve order.
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 19
NUMERICAL INTEGRATION METHODS: EULER SNN SIMULATION PRINCIPLES
𝑦↑′ (𝑡)=𝑓(𝑡,𝑦(𝑡)) 𝑦(𝑡↓0 )= 𝑦↓0 𝑦↓𝑛+1 = 𝑦↓𝑛 +ℎ𝑓( 𝑡↓𝑛 , 𝑦↓𝑛 )
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 20
! As a result:
NUMERICAL INTEGRATION METHODS: PARCKER-‐SOCHACKI SNN SIMULATION PRINCIPLES
! Assume that solu?on func?on can be represented with power series.
! Therefore, its deriva?ve based on Maclaurin series proper?es is
! A typical IVP
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 21
NUMERICAL INTEGRATION METHODS: PARCKER-‐SOCHACKI SNN SIMULATION PRINCIPLES
! If is linear:
! Ship it to eliminate constant term:
! With finite order N:
! Parallelism: ‒ Loop-‐level parallelism ‒ Parallel reduc?on
! As a result, the equa?on becomes: ! Benefit: adap?ve order and error tolerance control ‒ Local Lipschitz constant determines the number of itera?ons for achieving certain error tolerance:
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 22
SUMMARY SNN SIMULATION PRINCIPLES
! Neuron/Synapse Model
! Simula?on Type
! Integra?on Method
! Applica?on ! Requirements
! Result
Spiking Neural Networks Heterogeneous Implementa?on
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 24
OUTLINE
! Simula?on Flow ‒ Synchronous ‒ Hybrid ‒ Combined
! Implementa?on of Hybrid Simula?on Type ‒ Simula?on Flow ‒ Simula?on Phases
‒ Update ‒ Expand ‒ Sort
‒ Results ! Heterogeneous Implementa?on of Synchronous Simula?on Type
‒ NEST Simulator ‒ Sopware Architecture
SNN: HETEROGENEOUS IMPLEMENTATION
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 25
SYNCHRONOUS SIMULATION FLOW
! Simula?on step (dt) has two phases:
‒ Update: ‒ Compute new state for all neurons. ‒ Detect spiked neurons and process them separately to update spike history (divergence reduc?on).
‒ Propaga?on: ‒ Expand spikes to arriving events.
SNN: HETEROGENEOUS IMPLEMENTATION
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 26
HYBRID SIMULATION FLOW
! Simula?on step (dt) has two phases:
‒ Update: ‒ Compute new state for all neurons at the ?mes of arriving spikes (event-‐driven).
‒ Detect spiked neurons and process them separately to compute spike ?me and update spike history (divergence reduc?on).
‒ Propaga?on: ‒ Expand spikes to arriving events. ‒ Sort the events that are due for delivery in the current ?me step by arrival ?me for each neuron.
‒ Create a pointer array that maps neurons to their sorted events.
SNN: HETEROGENEOUS IMPLEMENTATION
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 27
COMBINED SIMULATION FLOW
! Exchange spikes between compute nodes (MPI) ‒ Spike is (?me stamp, source neuron ID)
! Store spikes in the spike ring buffer ‒ How many ring segments? int(max delay / min delay) ‒ The ring ‘rotates’ every step by one segment
! Expand spikes ‒ Spike segments are matched with relevant delay segments (synap?c connec?vity matrix)
‒ Arrival ?me is computed ‒ Synap?c events due filtered
! Sort synap?c events by arrival ?me for each target neuron (event-‐driven only)
! Update neurons ! Update synapses ! Gather new spikes
SNN: HETEROGENEOUS IMPLEMENTATION
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 28
IMPLEMENTATION OF HYBRID SIMULATION: UPDATE PHASE
! Wave-‐fronts (WFs) work on their segments of neurons represented by parameters and state stored in global memory (GM)
! A work-‐item (WI) takes a neuron and updates its state at every arriving event
! The state is stored back to GM
! Spike data is accumulated in local data store (LDS) and flushed to GM periodically.
! Spiked neurons are processed in a separate kernel (divergence reduc?on) ‒ Spike ?me is computed with Newton Raphson method (NR)
‒ Spiked neurons are updated for the rest of arriving events.
SNN: HETEROGENEOUS IMPLEMENTATION
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 29
IMPLEMENTATION OF HYBRID SIMULATION: EXPAND PHASE
! Load source spike packets from GM and stored them in con?guous array in LDS.
! Load synap?c pointer to LDS. ‒ Each neuron is connected to 100s or even 1000s of other neurons. Synap?c pointer describes where to get synap?c data for target neurons for known spike source neuron.
! Main loop ‒ A WF picks a source spike (?me stamp, source neuron ID) and the pointer
‒ A WI loads synap?c data for a target neuron, computes arrival ?me and stores synap?c event in the ring buffer in GM.
! Alone the way the sort histogram (required in radix sort) is loaded and stored in LDS. It is updated reflec?ng the newly created synap?c events.
SNN: HETEROGENEOUS IMPLEMENTATION
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 30
IMPLEMENTATION OF HYBRID SIMULATION: SORT PHASE
! We need to order synap?c events by arrival ?me and by target ID
! Radix sort: select next radix from LSD to MSD and group numbers based on radix value from smallest to largest ‒ Group numbers based on current radix and compute histogram (count of numbers with the same radix value)
‒ Scan histogram: compute prefix sum (global offset for the next grouping).
! 8 passes for 32-‐bit addressing and 4-‐bit radix.
SNN: HETEROGENEOUS IMPLEMENTATION
Radix sort example: 1 bit radix. LSD sort.
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 31
IMPLEMENTATION OF HYBRID SIMULATION: PERFORMANCE SNN: HETEROGENEOUS IMPLEMENTATION
Network Size (neurons)
Average Synapses per Neuron
Average Events per Step
Average Spikes per Step
Total Synapse Count
(millions)
“Tahi@” GPU Time per Step, (ms)
2,100,000 90 230,000 2,522 190 13.5
131,000 1,458 370,000 257 191 5.7
16,000 11,677 300,000 25 191 3.2
! Size-‐connec?on scalability in mul?-‐precision networks with per-‐WF precision alloca?on
! 1000 itera?ons, 250 us step ! Randomly-‐connected SNN with only AMPA synapses
! Speedups up to 100 depending on configura?on and compared devices
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 32
HETEROGENEOUS IMPLEMENTATION: SIMULATOR ARCHITECTURE SNN: HETEROGENEOUS IMPLEMENTATION
! Interface: Python – SLI – Network class (C++) ! Object-‐oriented: Nodes – Connec?ons – Events ! Network: administrates node connec?ons ! Scheduler: orchestrates simula?on
‒ Node management: update, prepare, finalize ‒ Execu?on type selec?on: serial, p-‐threads, OpenMP ‒ Step scheduling ‒ Event transmission via Communicator
! Communicator ‒ Inter-‐process communica?on ‒ MPI
! Features ‒ Primarily used as a vehicle for neuroscience research ‒ Generic, suitable for SNN applica?ons ‒ Both ?me-‐ and event-‐driven simula?on types ‒ Flexible node dynamics, a variety of built-‐in models ‒ Communica?on infrastructure to deliver both discrete and con?nuous events at the same ?me.
‒ Emphasis on correctness, performance and scalability
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 33
HETEROGENEOUS IMPLEMENTATION: SOFTWARE ARCHITECTURE SNN: HETEROGENEOUS IMPLEMENTATION
! Simplified UML diagram for heterogeneous part of implementa?on
! Neuron model templates (single and double precision) with OpenCL™ update phase
! Object-‐oriented design with shared vector members (data redundancy reduc?on)
! STL-‐like containers with OpenCL™ memory / buffer types underneath
! On-‐a-‐fly CPU-‐GPU execu?on steering: adaptability
! Data structure size stability: sta?s?cal monitoring, steering, error repor?ng
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 34
CONCLUSION HETEROGENEOUS IMPLEMENTATION OF NEURAL NETWORK ALGORITHMS
! Thank You!
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 35
LITERATURE HETEROGENEOUS IMPLEMENTATION OF NEURAL NETWORK ALGORITHMS
! R. Breje, et al., "Simula?on of networks of spiking neurons: A review of tools and strategies," Journal of Computa0onal Neurscience, vol. 23, no. 3, pp. 349-‐398, 2007.
! B Gaster, D R Kaeli, L Howes, and P Mistry, Heterogeneous Compu?ng with OpenCL ™ : Morgan Kaufmann Pub, 2011.
! T Harada and L Howes. (2011, Dec.) “Introduc?on to GPU Radix Sort.” Heterogeneous Compute. [Online].
! E. M. Izhikevich, "Simple model of spiking neurons," Neural Networks, IEEE Transac?ons on, vol. 14, pp. 1569-‐-‐1572, 2003.
! R Stewart and W Bair, "Spiking neural network simula?on: numerical integra?on with the Parker-‐Sochacki method," Journal of Computa?onal Neuroscience, vol. 27, no. 1, pp. 115-‐133, August 2009.
! D Yudanov, L Reznik, "Scalable mul?-‐precision simula?on of spiking neural networks on GPU with OpenCL ™." Neural Networks (IJCNN), The 2012 Interna?onal Joint Conference on. IEEE, 2012.
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 36
THANKS HETEROGENEOUS IMPLEMENTATION OF NEURAL NETWORK ALGORITHMS
! Wayne Burleson
! Mayank Daga
! Markus Diesmann
! Joseph Dinh
! Tan Ho
! Aus?n Hung
! Jeremy Johnson
! John Keaty
! Bingley Li
! Gewal?g Marc-‐Oliver
! Saul Mar?nez
! Haibin Niu
! Kyle Pour
! Jason Shantz
! Jason Tang
! Yury Zaytsev
| Heterogeneous implementa?on of Neural network algorithms | NOVEMBER 2013 | CONFIDENTIAL 37
DISCLAIMER & ATTRIBUTION
The informa?on presented in this document is for informa?onal purposes only and may contain technical inaccuracies, omissions and typographical errors.
The informa?on contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, sopware changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obliga?on to update or otherwise correct or revise this informa?on. However, AMD reserves the right to revise this informa?on and to make changes from ?me to ?me to the content hereof without obliga?on of AMD to no?fy any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combina?ons thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdic?ons. OpenCL™ is a trademark of Apple Inc. Other names are for informa?onal purposes only and may be trademarks of their respec?ve owners.