KNOWING WHEN TO STOP: EVALUATION Discrete inputs ...KNOWING WHEN TO STOP: EVALUATION AND...

1
KNOWING WHEN TO STOP: EVALUATION AND VERIFICATION OF CONFORMITY TO OUTPUT-SIZE SPECS Chenglong Wang 1 , Rudy Bunel 2 , Krishnamurthy Dvijotham 3 , Po-Sen Huang 3 , Edward Grefenstette 4 , Pushmeet Kohli 3 1 University of Washington, 2 University of Oxford, 3 Deepmind, 4 Facebook AI Research PROBLEM MOTIVATION ROBUSTNESS OBJECTIVE EXPERIMENTS I: Die Wae wird ausgestellt und durch den Zaun ubergeben. O: The weapon is issued and handed over by the fence. eos Adv Input: Die namen name descri und ames utt origin i.e. meet grammatisch. Adv output: names name names name names grammatically name names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names eos 1. Perturb test inputs randomly 2. Perturb test input with PGD attack Output size distribution for perturbed inputs. % of verified / vulnerable samples at different δ (Verification: 30 minutes timeout for each image) Original Adversarial Original Output: [6, 1], [0, 7, 4], [3] Adversarial: [6, 1, 1], [0, 1, 4, 3], [3, 3, 5, 3] Machine Translation (Seq2Seq) Multi-MNIST (Image to text) <END> <START> Einkaufen vom Sofa aus Shopping from Shopping sofa <END> <START> 6 1 6 1 CNN We study the termination problem of variable length computing models (Image2Text, Seq2Seq) <START> y 1 Encoder y 2 x’ Q: Given a model M, a sample input x, does the model terminates in K steps for all inputs x’ = x + δ? A: A new testing algorithm and the first verification algorithm to eval robustness of sequence models x Attacker Model M Sample Verifier Find adversarial input x’ Prove no such x’ exists Unknown (otherwise) German to English 27.6 BLEU score, Vocab size: 36,548 CNN — Relu-RNN 91.2% test accuracy Achieving Computational Robustness ML as a service Adaptive-time computation model Understanding and Debugging Models Discover abnormal model behaviors (e.g., privacy) Canonical specification for testing variable-length models Diff Given model M, sample x, M should terminate in K steps on all inputs x’ close to x. Greedy Decoder: OUR APPROACH Input Space X Perturbation Space S (x, δ ) Image [-1, 1] mn {x 0 2 X | ||x 0 - x|| 1 δ } Text V n {x 0 2 V n | n P i=1 [x i = x 0 i ] δ · n} Testing: Adversarial Attacks Verification: Constraint Encoding y t+1 = arg max {P (·|x, y 0:t )} L (M (x)) = l s.t. y t = eos y i 6= eos 8i<t Challenge 1: J(x) is Non-Differentiable Challenge 2: Discrete Inputs (Seq2Seq) Continuous Relaxation with Gumbel Softmax x 0 = S (x,δ ) (x + r x J (x)) Project to S (x, δ ) Exhaustive Search with Mixed Integer Programming (MIP) Solver Find x’ that maximizes L(M(x’)) Objective: L(M (x)) Prove x: L(M(x)) K Robust Vulnerable M CNN- RNN x M(x) L K L > K M(x) L K L > K M CNN- RNN x M(x) L K M(x) MIP Solver SAT UNSAT o max = max y2Y (o y ) ) o max o y , δ y 2 {0, 1} 8y 2 Y o max o y +(u - l y )(1 - δ y ) 8y 2 Y X y2Y δ y =1 ArgMax Encoding: Objective: max min t=1...K max z 6=eos y t [z ] - y t [eos] > 0 ˜ J (x)= k X t=1 max y t [eos] - max z 6=eos y t [z ], x =(x 1 ,...,x n ) Greedily minimize stop probability logit i = 2 6 6 6 6 4 ... -1 1 -1 ... 3 7 7 7 7 5 |V | x i = 2 6 6 6 6 4 ... 0 1 0 ... 3 7 7 7 7 5 |V | 2 3 ˜ x i Gumbel(i ) ˜ x = (˜ x 1 ,..., ˜ x n ) Projected Gradient Descent (PGD) Attack Complete but Expensive Scalable but No Guarantee Attack: δ = 0.25 Example:

Transcript of KNOWING WHEN TO STOP: EVALUATION Discrete inputs ...KNOWING WHEN TO STOP: EVALUATION AND...

  • KNOWING WHEN TO STOP: EVALUATION AND VERIFICATION OF CONFORMITY TO OUTPUT-SIZE SPECSChenglong Wang1, Rudy Bunel2, Krishnamurthy Dvijotham3, Po-Sen Huang3, Edward Grefenstette4, Pushmeet Kohli3

    1University of Washington, 2University of Oxford, 3Deepmind, 4Facebook AI Research

    PROBLEM

    MOTIVATION

    ROBUSTNESS OBJECTIVE EXPERIMENTS

    I: Die Waffe wird ausgestellt und durch den Zaun ubergeben. O: The weapon is issued and handed over by the fence. eos

    Adv Input: Die namen name descri und ames utt origin i.e. meet grammatisch. Adv output: names name names name names grammatically name names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names names eos

    1. Perturb test inputs randomly

    2. Perturb test input with PGD attack

    Output size distribution for perturbed inputs.

    % of verified / vulnerable samples at different δ (Verification: 30 minutes timeout for each image)

    Original Adversarial

    Original Output: [6, 1], [0, 7, 4], [3] Adversarial: [6, 1, 1], [0, 1, 4, 3], [3, 3, 5, 3]

    Machine Translation (Seq2Seq) Multi-MNIST (Image to text)

    … …

    Einkaufen vom Sofa aus

    Shopping from

    Shopping sofa

    6 1

    6 1

    CNN

    We study the termination problem of variable length computing models (Image2Text, Seq2Seq)

    y1

    Encoder …

    y2

    x’

    Q: Given a model M, a sample input x, does the model terminates in K steps for all inputs x’ = x + δ?

    A: A new testing algorithm and the first verification algorithm to eval robustness of sequence models

    x

    AttackerModel M

    Sample Verifier

    Find adversarial input x’

    Prove no such x’ exists

    Unknown(otherwise)

    German to English 27.6 BLEU score, Vocab size: 36,548

    CNN — Relu-RNN 91.2% test accuracy

    Achieving Computational Robustness ‣ ML as a service ‣ Adaptive-time computation model

    Understanding and Debugging Models ‣ Discover abnormal model behaviors (e.g., privacy)

    Canonical specification for testing variable-length models

    Diff

    Given model M, sample x, M should terminate in K steps on all inputs x’ close to x.Greedy Decoder:

    OUR APPROACH

    Input Space X Perturbation Space S(x, �)

    Image [�1, 1]m⇥n {x0 2 X | ||x0 � x||1 �}

    Text V n {x0 2 V n |nP

    i=1[xi = x0i] � · n}

    yt+1 = argmax {P (·|x, y0:t)}

    1

    Testing: Adversarial Attacks Verification: Constraint Encoding

    Input Space X Perturbation Space S(x, �)

    Image [�1, 1]m⇥n {x0 2 X | ||x0 � x||1 �}

    Text V n {x0 2 V n |nP

    i=1[xi = x0i] � · n}

    yt+1 = argmax {P (·|x, y0:t)}

    yt+1 = argmax {P (·|x, y0:t)}

    L (M(x)) = l s.t.yt = eos

    yi 6= eos 8i < t

    1

    Challenge 1: J(x) is Non-Differentiable

    Challenge 2: Discrete Inputs (Seq2Seq) Continuous Relaxation with Gumbel Softmax

    Discrete inputs: For discrete inputs, e.g., machine trans-lation tasks, inputs are discrete tokens in a language vocab-ulary. Formally, given the vocabulary V of the input lan-guage, the input space X is defined as all sentences com-posed of tokens in V , i.e., X = {(x1, . . . , xn) | xi 2V, n > 0}. Given an input sequence x = (x1, . . . , xn),we define the �-perturbation space of a sequence as all se-quences of length n with at most d� · ne tokens differentfrom x (i.e., � 2 [0, 1] denotes the percentage of tokens thatan attacker is allowed to modify). Formally, the perturba-tion space S(x, �) is defined as follows:

    S(x, �) = {(x01, . . . , x0n) 2 V n |nP

    i=1[xi 6= x0i] d� · ne}

    3.2. Extending Projected Gradient Descent Attacks

    In the projected gradient descent (PGD) attacks [25],2given an objective function J(x), the attacker calculates theadversarial example by searching for inputs in the attackspace to maximize J(x). In the basic attack algorithm, weperform the following updates at each iteration:

    x0 = ⇧S(x,�) (x+ ↵rxJ(x)) (5)

    where ↵ > 0 is the step size and ⇧S(x,�) denotes the pro-jection of the attack to the valid space S(x, �). Observe thatthe adversarial objective in Eq. (4) cannot be directly usedas J(x) to update x as the length of the sequence is nota differentiable objective function. This hinders the directapplication of PGD to output-lengthening attacks. Further-more, when the input space S is discrete, gradient descentcannot be directly be used because it is only applicable tocontinuous input spaces.

    In the following, we show our extensions of the PGDattack algorithm to handle these challenges.

    Greedy approach for sequence lengthening We intro-duce a differentiable proxy of `(x). Given an input x whosedecoder output logits are (o1, . . . , ok) (i.e., the decoded se-quence is y = (argmax(o1), . . . , argmax(ok))), instead ofdirectly maximizing the output sequence length, we use agreedy algorithm to find an output sequence whose lengthis longer than k by minimizing the probability of the modelto terminate within k steps. In other words, we minimizethe log probability of the model to produce eos at any ofthe timesteps between 1 to k. Formally, the proxy objectiveJ̃ is defined as follows:

    J̃(x) =kP

    t=1max

    ⇢ot[eos]� max

    z 6=eosot[z], �✏

    where ✏ > 0 is a hyperparameter to clip the loss. Thisis piecewise differentiable w.r.t. the inputs x (in the same

    2Here the adversarial objective is stated as maximization, so the algo-rithm is Projected Gradient Ascent, but we stick with the PGD terminologysince it is standard in the literature

    sense that the ReLU function is differentiable) and can beefficiently optimized using PGD.

    3.3. Continuous relaxation for discrete inputs

    While we can apply the PGD attack with the proxy ob-jective on the model with continuous inputs by setting theprojection function ⇧S(x,�) as the Euclidean projection, wecannot directly update discrete inputs. To enable a PGD-type attack in the discrete input space, we use the Gumbeltrick [18] to reparameterize the input space to perform con-tinuous relaxation of the inputs.

    Given an input sequence x = (x1, . . . , xn), for eachxi, we construct a distribution ⇡i 2 R|V | initialized with⇡i[xi] = 1 and ⇡i[z] = �1 for all z 2 V \ {xi}. The soft-max function applied to ⇡i is a probability distribution overinput tokens at position i with a mode at xi. With this repa-rameterization, instead of feeding x = (x1, . . . , xn) intothe model, we feed the Gumbel softmax sampling from thedistribution (u1, . . . , un). The sample x̃ = (x̃1, . . . , x̃n) iscalculated as follows:

    ui ⇠ Uniform(0, 1); gi = � log(� log(ui))p = softmax(⇡); x̃i = softmax( gi+log pi⌧ )

    where ⌧ is the Gumbel-softmax sampling temperature thatcontrols the discreteness of x̃. With this relaxation, we per-form PGD attack on the distribution ⇡ at each iteration.Since ⇡i is unconstrained, the projection step in (5) is un-necessary.

    When the final ⇡0 = (⇡01, . . . ,⇡n) is obtained from thePGD attack, we draw samples x0i ⇠ Categorical(⇡i) to getthe final adversarial example for the attack.

    4. Verified Bound on Output Length

    While heuristics approaches can be useful in finding at-tacks, they can fail due to the difficulty of optimizing non-differentiable nonconvex functions. These challenges showup particularly when the perturbation space is small or whenthe target model is trained with strong bias in the trainingdata towards short output sequences (e.g., the Show-and-Tell model as we will show in Section 6). Thus, we designa formal verification approach for complete reasoning of theoutput-size modulation problem, i.e., finding provable guar-antees that no input within a certain set of interest can resultin an output sequence of length above a certain threshold.

    Our approach relies on counterexample search using in-telligent brute-force search methods, taking advantage ofpowerful modern integer programming solvers [15]. We en-code all the constraints that an adversarial example shouldsatisfy as linear constraints, possibly introducing additionalbinary variables. Once in the right formalism, these can befed into an off-the-shelf Mixed Integer Programming (MIP)solver, which provably solves the problem, albeit with a po-tentially large computational cost. The constraints consist

    4324

    Project to Input Space X Perturbation Space S(x, �)

    Image [�1, 1]m⇥n {x0 2 X | ||x0 � x||1 �}

    Text V n {x0 2 V n |nP

    i=1[xi = x0i] � · n}

    yt+1 = argmax {P (·|x, y0:t)}

    yt+1 = argmax {P (·|x, y0:t)}

    L (M(x)) = l s.t.yt = eos

    yi 6= eos 8i < t

    1

    Exhaustive Search with Mixed Integer Programming (MIP) Solver

    Find x’ that maximizes L(M(x’))

    Objective:

    Input Space X Perturbation Space S(x, �)

    Image [�1, 1]m⇥n {x0 2 X | ||x0 � x||1 �}

    Text V n {x0 2 V n |nP

    i=1[xi = x0i] � · n}

    yt+1 = argmax {P (·|x, y0:t)}

    yt+1 = argmax {P (·|x, y0:t)}

    L (M(x)) = l s.t.yt = eos

    yi 6= eos 8i < t

    maxx2S

    L(M(x))

    1

    Prove ∀x: L(M(x)) ≤ K

    RobustVulnerable

    M CNN-RNN

    xM(x)

    L ≤ K L > K

    M CNN-RNN

    xM(x)

    L ≤ K L > K

    M CNN-RNN

    xM(x)

    L ≤ K

    M(x)

    MIP Solver

    SAT UNSAT

    of four parts: (1) the initial restrictions on the model inputs(encoding S(x, �)), (2) the relations between the differentactivations of the network (implementing each layer), (3)the decoding strategy (connection between the output logitsand the inputs at the next step), and (4) the condition for itbeing a counterexample (ie. a sequence of length larger thanthe threshold). In the following, we show how each part ofthe constraints is encoded into MIP formulas.

    Our formulation is inspired by pior work on encodingfeed-forward neural networks as MIPs [3, 7, 32]. The im-age captioning model we use consists of an image embed-ding model, a feedforward convolutional neural networkthat computes an embedding of the image, followed by arecurrent network that generates tokens sequentially start-ing with the initial hidden state set to the image embedding.

    The image embedding model is simply a sequence of lin-ear or convolutional layers and ReLU activation functions.Linear and convolutional layers are trivially encoded as lin-ear equality constraints between their inputs and outputs,while ReLUs are represented by introducing a binary vari-able and employing the big-M method [16]:

    xi = max (x̂i, 0) ) �i 2 {0, 1}, xi � 0 (6a)xi ui · �i, xi � x̂i (6b)xi x̂i � li · (1� �i) (6c)

    with li and ui being lower and upper bounds of x̂i whichcan be obtained using interval arithmetic (details in [3]).

    Our novel contribution is to introduce a method to extendthe techniques to handle greedy decoding used in recurrentnetworks. For a model with greedy decoding, the token withthe most likely prediction is fed back as input to the nexttime step. To implement this mechanism as a mixed integerprogram, we employ a big-M method [36]:

    omax =maxy2Y

    (oy)

    ) omax � oy, �y 2 {0, 1} 8y 2 Y (7a)omax oy + (u� ly)(1� �y) 8y 2 Y (7b)X

    y2Y�y = 1 (7c)

    with ly, uy being a lower/upper bound on the value of oyand u = maxy2Y uy (these can again be computed us-ing interval arithmetic). Implementing the maximum inthis way gives us both a variable representing the value ofthe maximum (omax), as well as a one-hot encoding of theargmax (�y). If the embedding for each token is given by{embi | i 2 Y}, we can simply encode the input to the fol-lowing RNN timestep as

    Py2Y �y · emby , which is a linear

    function of the variables that we previously constructed.With this mechanism to encode the greedy decoding, we

    can now unroll the recurrent model for the desired numberof timesteps. To search for an input x with output length

    ` (x) � K̂, we unroll the recurrent network for K̂ stepsand attempt to prove that at each timestep, eos is not themaximum logit, as in (2). We setup the problem as:

    max mint=1..K̂

    maxz 6=eos

    ot[z]� ot[eos]�

    (8)

    where o(k) represents the logits in the k-th decoding step.We use an encoding similar to the one of Equation (7) torepresent the objective function as a linear objective withadded constraints. If the global optimal value of Eq. (8)is positive, this is a valid counterexample: at all timestepst 2 [1..K̂], there is at least one token greater than the eostoken, which means that the decoding should continue. Onthe other hand, if the optimal value is negative, that meansthat those conditions cannot be satisfied and that it is notpossible to generate a sequence of length greater than K̂.The eos token would necessarily be predicted before. Thiswould imply that our robustness property is True.

    5. Target Model Mechanism

    We use image captioning and machine translation mod-els as specific target examples to study the output lengthmodulation problem. We now introduce their mechanism.

    Image captioning models The image captioning modelwe consider is an encoder-decoder model composed of twomodules: a convolution neural network (CNN) as an en-coder for image feature extraction and a recurrent neuralnetwork (RNN) as a decoder for caption generation [34].

    Formally, the input to the model x is an m⇥ n sized im-age from the space X = [�1, 1]m⇥n, the CNN-RNN modelcomputes the output sequence as follows:

    i0 = CNN(x); h0 = 0ot, ht+1 = RNNCell(it, ht)yt = argmax(ot); it+1 = emb(yt)

    where emb denotes the embedding function.The captioning model first run the input image x through

    a CNN to obtain the image embedding and feed it to theRNN as the initial input i0 along with the initial state h0.At each decode step, the RNN uses the input it and state htto compute the new state ht+1 as well as the logits ot repre-senting the log-probability of the output token distributionin the vocabulary. The output yt is the token in the vocabu-lary with highest probability based on ot, and it is embeddedinto the continuous space using an embedding matrix Wembas Wemb[yt]. The embedding is fed to the next RNN cell asthe input for the next decoding step.

    Machine translation models The machine translationmodel is an encoder-decoder model [30, 9] with both the

    4325

    ArgMax Encoding:

    Objective:

    Input Space X Perturbation Space S(x, �)

    Image [�1, 1]m⇥n {x0 2 X | ||x0 � x||1 �}

    Text V n {x0 2 V n |nP

    i=1[xi = x0i] � · n}

    yt+1 = argmax {P (·|x, y0:t)}

    yt+1 = argmax {P (·|x, y0:t)}

    L (M(x)) = l s.t.yt = eos

    yi 6= eos 8i < t

    maxx2S

    L(M(x))

    max mint=1...K

    maxz 6=eos

    yt[z]� yt[eos]�> 0

    1

    Input Space X Perturbation Space S(x, �)

    Image [�1, 1]m⇥n {x0 2 X | ||x0 � x||1 �}

    Text V n {x0 2 V n |nP

    i=1[xi = x0i] � · n}

    yt+1 = argmax {P (·|x, y0:t)}

    yt+1 = argmax {P (·|x, y0:t)}

    L (M(x)) = l s.t.yt = eos

    yi 6= eos 8i < t

    maxx2S

    L(M(x))

    max mint=1...K

    maxz 6=eos

    yt[z]� yt[eos]�> 0

    J̃(x) =kX

    t=1

    max

    ⇢yt[eos]� max

    z 6=eosyt[z], ✏

    1

    Input Space X Perturbation Space S(x, �)

    Image [�1, 1]m⇥n {x0 2 X | ||x0 � x||1 �}

    Text V n {x0 2 V n |nP

    i=1[xi = x0i] � · n}

    yt+1 = argmax {P (·|x, y0:t)}

    yt+1 = argmax {P (·|x, y0:t)}

    L (M(x)) = l s.t.yt = eos

    yi 6= eos 8i < t

    maxx2S

    L(M(x))

    max mint=1...K

    maxz 6=eos

    yt[z]� yt[eos]�> 0

    J̃(x) =kX

    t=1

    max

    ⇢yt[eos]� max

    z 6=eosyt[z], ✏

    x = (x1, . . . , xn)

    1

    Greedily minimize stop probability

    logit

    Input Space X Perturbation Space S(x, �)

    Image [�1, 1]m⇥n {x0 2 X | ||x0 � x||1 �}

    Text V n {x0 2 V n |nP

    i=1[xi = x0i] � · n}

    yt+1 = argmax {P (·|x, y0:t)}

    yt+1 = argmax {P (·|x, y0:t)}

    L (M(x)) = l s.t.yt = eos

    yi 6= eos 8i < t

    maxx2S

    L(M(x))

    max mint=1...K

    maxz 6=eos

    yt[z]� yt[eos]�> 0

    J̃(x) =kX

    t=1

    max

    ⇢yt[eos]� max

    z 6=eosyt[z], ✏

    x = (x1, . . . , xn)

    onehot(xi) =

    2

    66664

    . . .010. . .

    3

    77775

    |V |

    ⇡i =

    2

    66664

    . . .�11�1. . .

    3

    77775

    |V |

    1

    Input Space X Perturbation Space S(x, �)

    Image [�1, 1]m⇥n {x0 2 X | ||x0 � x||1 �}

    Text V n {x0 2 V n |nP

    i=1[xi = x0i] � · n}

    yt+1 = argmax {P (·|x, y0:t)}

    yt+1 = argmax {P (·|x, y0:t)}

    L (M(x)) = l s.t.yt = eos

    yi 6= eos 8i < t

    maxx2S

    L(M(x))

    max mint=1...K

    maxz 6=eos

    yt[z]� yt[eos]�> 0

    J̃(x) =kX

    t=1

    max

    ⇢yt[eos]� max

    z 6=eosyt[z], ✏

    x = (x1, . . . , xn)

    xi =

    2

    66664

    . . .010. . .

    3

    77775

    |V |

    ⇡i =

    2

    66664

    . . .�11�1. . .

    3

    77775

    |V |

    1

    Input Space X Perturbation Space S(x, �)

    Image [�1, 1]m⇥n {x0 2 X | ||x0 � x||1 �}

    Text V n {x0 2 V n |nP

    i=1[xi = x0i] � · n}

    yt+1 = argmax {P (·|x, y0:t)}

    yt+1 = argmax {P (·|x, y0:t)}

    L (M(x)) = l s.t.yt = eos

    yi 6= eos 8i < t

    maxx2S

    L(M(x))

    max mint=1...K

    maxz 6=eos

    yt[z]� yt[eos]�> 0

    J̃(x) =kX

    t=1

    max

    ⇢yt[eos]� max

    z 6=eosyt[z], ✏

    x = (x1, . . . , xn)

    xi =

    2

    66664

    . . .010. . .

    3

    77775

    |V |

    ⇡i =

    2

    66664

    . . .�11�1. . .

    3

    77775

    |V |

    x̃i ⇠ Gumbel(⇡i)

    1

    Input Space X Perturbation Space S(x, �)

    Image [�1, 1]m⇥n {x0 2 X | ||x0 � x||1 �}

    Text V n {x0 2 V n |nP

    i=1[xi = x0i] � · n}

    yt+1 = argmax {P (·|x, y0:t)}

    yt+1 = argmax {P (·|x, y0:t)}

    L (M(x)) = l s.t.yt = eos

    yi 6= eos 8i < t

    maxx2S

    L(M(x))

    max mint=1...K

    maxz 6=eos

    yt[z]� yt[eos]�> 0

    J̃(x) =kX

    t=1

    max

    ⇢yt[eos]� max

    z 6=eosyt[z], ✏

    x = (x1, . . . , xn)

    x̃ = (x̃1, . . . , x̃n)

    xi =

    2

    66664

    . . .010. . .

    3

    77775

    |V |

    ⇡i =

    2

    66664

    . . .�11�1. . .

    3

    77775

    |V |

    x̃i ⇠ Gumbel(⇡i)

    1

    Projected Gradient Descent (PGD) Attack

    Complete but ExpensiveScalable but No Guarantee

    Attack: δ = 0.25

    Example: