Google Big Data Expo

a 30 min short walkRobert Saxby - Big Data Product Specialist

Sustainability

Google datacenters have half the overhead of typical industry data centers

Largest private investor in renewables: $2 billion generating 3.2 GW

Applying Machine Learning produced 40% reduction in cooling energy

Large Datasets Cutting Edge Models Compute at Scale

Drivers of Success in AI/ML Projects

App DeveloperData Scientist

Build custom modelsUse/extend OSS SDK Use pre-built models

ML researcher

Cloud MLE ML Perception services

End to End: Google Cloud AI Spectrum

Proprietary + Confidential

What is TensorFlow?

● A system for distributed, parallel machine learning● It’s based on general-purpose dataflow graphs● It targets heterogeneous devices

○ A single PC with CPU○ A single PC with GPU(s)○ A mobile device○ Clusters of 100s or 1000s of CPUs, GPUs and TPUs


Another data flow system

MatMul

Add Relu

biases

weights

examples

labels

Xent

Graph of Nodes, also called Operations or ops


With tensors

MatMul

Add Relu

biases

weights

examples

labels

Xent

Edges are N-dimensional arrays: Tensors


What’s in a name?

0 Scalar (magnitude only) s = 483

1 Vector (magnitude and direction) v = [1.1, 2.2, 3.3]

2 Matrix (table of numbers) m = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

3 3-Tensor (cube of numbers) t = [[[2], [4], [6]], [[8], [10], [12]], [[14], [16], [18]]]

4 n-Tensor (you get the idea) ....


Convolutional layer

W1[4, 4, 3]

W2[4, 4, 3]

+padding

W[4, 4, 3, 2]

filter size

input channels

output channels

stride

convolutionalsubsampling




With state

Add Mul

biases

...

learning rate

−=...

'Biases' is a variable −= updates biasesSome ops compute gradients


And distributed

Add Mul

biases

...

learning rate

−=...

Device BDevice A

TensorFlow Distributed Execution Engine

CPU GPU Android iOS ...

C++ FrontendPython Frontend ...

Layers

Estimator

Models in a box

Train and evaluate models

Build models

Keras Model

Canned Estimators


Artificial IntelligenceThe science of making things smart

Neural NetworkA type of algorithm in machine learning

Machine LearningBuilding machines that can learn


The popular imagination of what ML is

Lots of data Magical resultsComplex mathematics in multidimensional spaces


In reality, ML is

Collect data

Create themodel

Refine themodel

Understand and prepare

the data

Serve themodel

Defineobjectives


Neural Network is a function that can learn


How about this?


Neural Network can extract hidden features from data


28x28 pixels

softmax

...

...

0 1 2 9

weighted sum of all pixels + bias

neuron outputs

784 pixels

A very simple model

L0,0

w0,0

w0,1

w0,2

w0,3

… w0,9

w1,0

w1,1

w1,2

w1,3

… w1,9

w2,0

w2,1

w2,2

w2,3

… w2,9

w3,0

w3,1

w3,2

w3,3

… w3,9

w4,0

w4,1

w4,2

w4,3

… w4,9

w5,0

w5,1

w5,2

w5,3

… w5,9

w6,0

w6,1

w6,2

w6,3

… w6,9

w7,0

w7,1

w7,2

w7,3

… w7,9

w8,0

w8,1

w8,2

w8,3

… w8,9

…w783,0

w783,1

w783,2

… w783,9

xxxxxxx

x

L1,0

L1,1

L1,2

L1,3

… L1,9

L2,0

L2,1

L2,2

L2,3

… L2,9

L3,0

L3,1

L3,2

L3,3

… L3,9

L4,0

L4,1

L4,2

L4,3

… L4,9

…L99,0

L99,1

L99,2

… L99,9

L0,0

L0,1

L0,2

L0,3

… L0,9

…

+ b0 b

1 b

2 b

3 … b

9

+ Same 10 biases on all lines

X : 100 images,one per line, flattened

784 pixels

784 lines

broadcast

100 images at a time


9...0 1 2

sigmoid function

softmax

200

100

60

10

30

784overkill

;-)

Going Deep, 5 Layers Deep


TanhSigmoidBinary StepIdentity Relu

Softmax

1 1 1

-1

weighted sum of all pixels + bias

2.0

1.0

0.1

Scores→ Logits

0.7

0.2

0.1

Probabilities

Activation Functions


Predictions Images Weights Biases

Y[100, 10] X[100, 784] W[784,10] b[10]

matrix multiply broadcast on all lines

applied line by line

tensor shapes in [ ]

Softmax on a batch of images

Demo

http://www.youtube.com/watch?v=LeAacAzd6oY


Cross entropy:

computed probabilities

actual probabilities, “one-hot” encoded

0 0 0 0 0 0 1 0 0 0

this is a “6”

0.1 0.2 0.1 0.3 0.2 0.1 0.9 0.2 0.1 0.1

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9Success?


Gradient Descent


import tensorflow as tf

X = tf.placeholder(tf.float32, [None, 28, 28, 1])W = tf.Variable(tf.zeros([784, 10]))b = tf.Variable(tf.zeros([10]))init = tf.initialize_all_variables()

# modelY=tf.nn.softmax(tf.matmul(tf.reshape(X,[-1, 784]), W) + b)

# placeholder for correct answersY_ = tf.placeholder(tf.float32, [None, 10])

# loss functioncross_entropy = -tf.reduce_sum(Y_ * tf.log(Y))

# % of correct answers found in batchis_correct = tf.equal(tf.argmax(Y,1), tf.argmax(Y_,1))accuracy = tf.reduce_mean(tf.cast(is_correct,tf.float32))

optimizer = tf.train.GradientDescentOptimizer(0.003)train_step = optimizer.minimize(cross_entropy)

sess = tf.Session()sess.run(init)

for i in range(10000):# load batch of images and correct answersbatch_X, batch_Y = mnist.train.next_batch(100)train_data={X: batch_X, Y_: batch_Y}

# trainsess.run(train_step, feed_dict=train_data)

# success ? add code to print ita,c = sess.run([accuracy, cross_entropy], feed=train_data)

# success on test data ?test_data={X:mnist.test.images, Y_:mnist.test.labels}a,c = sess.run([accuracy, cross_entropy], feed=test_data)

initialisation

model

success metrics

training step

Run

The whole code

Workshop Self-paced code lab (summary below ↓): goo.gl/mVZloU Code: github.com/martin-gorner/tensorflow-mnist-tutorial

1-5. Theory (install then sit back and listen or read)

Neural networks 101: softmax, cross-entropy, mini-batching, gradient descent, hidden layers, sigmoids, and how to implement them in Tensorflow

6. Practice (full instructions for this step)Open file: mnist_1.0_softmax.pyRun it, play with the visualisations (keyboard shortcuts on previous slide), read and understand the code as well as the basic structure of a Tensorflow program.

7. Practice (full instructions for this step)

Start from the file mnist_1.0_softmax.py and add one or two hidden layers.

Solution in: mnist_2.0_five_layers_sigmoid.py


Special care for deep neural networks: use RELU activation functions, use a better optimiser, initialise weights with random values and beware of the log(0)

9-10. Practice (full instructions for this step)

Use a decaying learning rate and then add dropout

Solution in: mnist_2.2_five_layers_relu_lrdecay_dropout.py

11. Theory (sit back and listen or read)

Convolutional networks


Replace your model with a convolutional network, without dropout.

Solution in: mnist_3.0_convolutional.py

13. Challenge (full instructions for this step)

Try a bigger neural network (good hyperparameters on slide 43) and add dropout on the last layer to get >99%

Solution in: mnist_3.0_convolutional_bigger_dropout.py

?

?

https://goo.gl/mVZloU

https://github.com/martin-gorner/tensorflow-mnist-tutorial

https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist/#1













https://cloud.google.com/solutions/running-distributed-tensorflow-on-compute-engine

Distributed TensorFlow on Compute Engine


Machine Learning on any data, of any size

Cloud ML EnginePortable models with TensorFlow

Services are designed to work together

Managed distributed training infrastructure that supports CPUs and GPUs

Automatic hyperparameter tuning

Custom Estimators: The Model

https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/census

...

def _model_fn(mode, features, labels):

...

if mode == Modes.PREDICT:

...

return tf.estimator.EstimatorSpec(mode, predictions=predictions, export_outputs=export_outputs)

...

if mode == Modes.TRAIN:

...

return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)

...

https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/census

Custom Estimators: The Task

... train_input = lambda: model.generate_input_fn(hparams.train_files, num_epochs=hparams.num_epochs, batch_size=hparams.train_batch_size)

...

"""This function is used by learn_runner to create an Experiment which executes model code

provided in the form of an Estimator and input functions."""

def _experiment_fn(run_config, hparams):

tf.estimator.Estimator( model.generate_model_fn(

... ),

train_input_fn=train_input,

eval_input_fn=eval_input,

**experiment_args

)

...


Running locally

gcloud ml-engine local train \

--module-name trainer.task --package-path trainer/ \

-- \

--train-files $TRAIN_DATA --eval-files $EVAL_DATA --train-steps 1000 --job-dir $MODEL_DIR

trainingdata evaluation

data

outputdirectory

train locally


Single trainer running in the cloud

gcloud ml-engine jobs submit training $JOB_NAME --job-dir $OUTPUT_PATH \

--runtime-version 1.0 --module-name trainer.task --package-path trainer/ --region $REGION \

-- \

--train-files $TRAIN_DATA --eval-files $EVAL_DATA --train-steps 1000 --verbosity DEBUG

train in the cloudregion

Google cloud storage location


Distributed training in the cloud



--scale-tier STANDARD_1

-- \


distributed


In reality, ML is

Collect data

Create themodel

Refine themodel


the data

Serve themodel

Defineobjectives


Refine the model

Feature engineering

Better algorithms

More examples, more data

Hyperparameter tuning



● Automatic hyperparameter tuning service

● Build better performing models faster and save many hours of manual tuning

● Google-developed search (Bayesian Optimisation) algorithm efficiently finds better hyperparameters for your model/dataset

HyperParam #1

Obje

ctive

We want to find this

Not these

https://cloud.google.com/blog/big-data/2017/08/hyperparameter-tuning-in-cloud-machine-learning-engine-using-bayesian-optimization

https://cloud.google.com/blog/big-data/2017/08/hyperparameter-tuning-in-cloud-machine-learning-engine-using-bayesian-optimization





--scale-tier STANDARD_1 --config $HPTUNING_CONFIG

-- \


hypertuning



trainingInput: hyperparameters: goal: MAXIMIZE hyperparameterMetricTag: accuracy maxTrials: 4 maxParallelTrials: 2 params: - parameterName: first-layer-size type: INTEGER minValue: 50 maxValue: 500 scaleType: UNIT_LINEAR_SCALE

...

...

# Construct layers sizes with exponetial decay hidden_units=[ max(2, int(hparams.first_layer_size * hparams.scale_factor**i)) for i in range(hparams.num_layers) ],

...

parser.add_argument( '--first-layer-size', help='Number of nodes in the 1st layer of the DNN', default=100, type=int )

...

hptuning_config.yaml task.py


In reality, ML is

Collect data

Create themodel

Refine themodel


the data

Serve themodel

Defineobjectives


Deploying the model

Creating model

gcloud ml-engine models create $MODEL_NAME --regions=$REGION

Creating versions

gcloud ml-engine versions create v1 --model $MODEL_NAME --origin $MODEL_BINARIES \

--runtime-version 1.0

gcloud ml-engine models list


Predicting

gcloud ml-engine predict --model $MODEL_NAME --version v1 --json-instances ../test.json

Using REST:

POST https://ml.googleapis.com/v1/{name=projects/**}:predict

JSON format (in this case):

{"age": 25, "workclass": "private", "education": "11th", "education_num": 7, "marital_status": "Never-married", "occupation": "machine-op-inspector", "relationship": "own-child", "gender": " male", "capital_gain": 0, "capital_loss": 0, "hours_per_week": 40, "native_country": " United-States"}

https://ml.googleapis.com/v1/%7Bname=projects/**%7D:predict

Google Big Data Expo

Data & Analytics

Transcript of Google Big Data Expo