PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015.
-
Upload
mildred-stone -
Category
Documents
-
view
214 -
download
0
Transcript of PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015.
PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKSJoe Bradish
CS5802 Fall 2015
BASICS OF ARTIFICIAL NEURAL NETWORKSWhat is an Artificial Neural Network (ANN)?
What makes up a neuron?
How is “learning” modelled in ANNs?
STRUCTURE OF A NEURAL NETWORK
A neural network is a collection of interconnected neurons that compute and generate impulses
Specific parts include neurons, synapses, and activation functions
An artificial neural network is a mathematical model, based on natural neural networks found in animals’ brains.
BASIC STRUCTURE OF A NEURON
• There is an input vector containing {x1, x2, … , xn} and an associated vector of weights {w1, w2, … , wn}.
• The input x weight vector summation is calculated and the output is sent into an activation function.
• Based on the activation function, the summation is mapped to some value, generally between {-1, 1}, such as in the shown step activation function. This value is then considered the output of the neuron.
TRAINING A NEURAL NETWORK
To properly train a neural network, the weights must be “tuned” to model the goal function as closely as possible.
“Goal” function represents the function that maps input data to output data in our training set.
Training a neural network is by far the most costly step in the majority of scenarios.
Google has reported training times <2 days for certain problems and network sizes.
Once trained, new items can be classified very quickly though
Some popular options Backpropagation (used in the majority of cases).
genetic algorithms with simulated annealing
Hebbian learning
a combination of different methods in a “Committee of Machines”
BACKPROPAGATION
Most popular training method
Works by reducing error on the training set Requires many training examples to get error low
Uses gradient descent on the error mean squared error
Partial derivatives are used to determine which neuron/weight to blame for parts of the error
Backward pass is done through backpropagation• Uses chain rule to calculate partial derivative
Underlying operations are embarrassingly parallel, but many problems still remain
Backpropagation, Communication and Computational issues all must be considered when scaling neural networks
PROBLEMS WITH SCALING BACKPROPAGATION
Requires neurons of one layer to be fully connected to the neurons of the next layer
Lots of communication required
Gradient descent is prone to getting stuck in local optima Requires many iterations to reduce error to acceptable rate
Training data set sizes are very large Rule of thumb for error
Training set size should be roughly the number of weights divided by the permitted classification error rate
10% error rate = 10x the number of weights, 1% = 100x, etc.
COMPUTATIONAL ISSUES IN SCALING ANNS
Main operation is matrix multiplication N-node layer requires N2 scalar multiplications and N sums of N
numbers
Requires a good multiply or multiply-and-add function
Activation function Often sigmoid is used f(x) = 1/(1+e-x)
Has to be approximated efficiently
COMMUNICATION ISSUES IN SCALING ANNS
High degree of connectivity
Large data flows
Structure and bandwidth are very important
Broadcasts and ring topologies are often used because of the necessary communication requirements
More processors does not mean faster computation in many cases
TWO KEY METHODOLOGIES
Model dimension One model, but multiple workers
train individual parts
High amount of communication Need to synchronize at the edges
Efficient when the computation is heavy per neuron
Datasets where each data point contains many attributes
Data Dimension Different workers train on
completely different sets of data
Also high amount of communication Need to synchronize parameters,
weights to ensure consistent model
Efficient when each weight needs a high amount of computation
Large datasets where each data point only contains a few attributes
Example of splitting on the data dimension
SPANN (SCALABLE PARALLEL ARTIFICIAL NEURAL NETWORK)
Inspired by human brain’s ability to communicate between groups of neurons without fully connected paths
Focused on parallelizing the model dimension
Uses MPI library
Reduces need for communication between every neuron in consecutive layers of a neural network
Only boundary values are communicated between “ghost” neurons
BIOLOGICAL INSPIRATION
Neocortex is the part of the brain most commonly associated with intelligence
Columnar structure with an estimated 6 layers
SPANN CONT.
Recall from Serial Backpropagation Parallel Backpropagation
• L is the number of layers, including input/output layers
• Nproc is the number of processors being used
• As shown by the first box, every input is sent to every processor
• Each processor only has Nhidden / Nproc hidden neurons/layers and Nout / Nproc output layers
• Divide by number of processors to get weights/processor
Example comparison of 3 layer network:• Serial ANN
• 200 input, 48 output, 125 hidden• (200+48)*125 = 31,000 weights
need to be trained• Using SPANN in a Parallel ANN
• 200 input, 48 output, 120 hidden• 6 layers, 8 processors• 30,280 weights need to be
trained, but only 3785 per processor
PERFORMANCE COMPARISON
• 37890 weights on a serial ANN took 1313 seconds to complete training, compared to 30,240 weights taking 842 seconds• There is significant slowdown shown in the serial version
• 8 resolution computes ~36 weights/sec, but 9 resolution falls to only ~28.5 weights/sec
• The time taken per weight grows slower in SPANN, so once the size of the training data reaches a significant size, it becomes much quicker per weight.• Speedup factor is related to the training data size• Larger size, larger speedup
RESULTS CONT.
SPANN CONCLUSIONS
Developed an architecture that can scale into billions of weights or synapses
Successful by reducing the communication requirements in between layers to a few “gatekeeper nodes”
Uses a human biological model as inspiration
SCALING ANNS CONCLUSIONS
• Neural networks are a tool that have provided significant developments in artificial intelligence and machine learning fields
• Scaling issues are big, even though calculations are embarrassingly parallel
• Communication
• Computational
• SPANN showed promising results
• Research continues today
• Heavy focus on communication, as training set sizes are growing faster than the computational requirements in many cases
QUESTIONS?