Full accounting for verifiable...

Full accounting for verifiable outsourcing

Riad S. Wahby?, Ye Ji, Andrew J. Blumberg†, abhi shelat‡,Justin ThalerM, Michael Walfish, and Thomas Wies

?Stanford UniversityNew York University

†The University of Texas at Austin‡Northeastern UniversityMGeorgetown University

July 6th, 2017

Probabilistic proofs enable outsourcing

client server

program,inputs

outputs

SBW11CMT12SMBW12TRMP12SVPBBW12SBVBPW13VSBW13PGHR13BCGTV13BFRSBW13BFR13DFKP13BCTV14aBCTV14b

BCGGMTV14FL14

KPPSST14FTP14

WSRHBW15BBFR15

CFHKNPZ15CTV15

KZMQCPPsS15D-LFKP16

NT16ZGKPP17

client server

program,inputs

outputs+ short proof

Approach: Server’s response includes short proof of correctness.

[Babai85, GMR85, BCC86, BFLS91, FGLSS91, ALMSS92, AS92, Kilian92, LFKN92,

Shamir92, Micali00, BG02, BS05, GOS06, BGHSV06, IKO07, GKR08, KR09, GGP10,

Groth10, GLR11, Lipmaa11, BCCT12, GGPR13, BCCT13, Thaler13, KRR14, . . . ]

BCGGMTV14FL14

KPPSST14FTP14

WSRHBW15BBFR15

CFHKNPZ15CTV15

KZMQCPPsS15D-LFKP16

NT16ZGKPP17

client server

program,inputs

Approach: Server’s response includes short proof of correctness.

[Babai85, GMR85, BCC86, BFLS91, FGLSS91, ALMSS92, AS92, Kilian92, LFKN92,

Shamir92, Micali00, BG02, BS05, GOS06, BGHSV06, IKO07, GKR08, KR09, GGP10,

Groth10, GLR11, Lipmaa11, BCCT12, GGPR13, BCCT13, Thaler13, KRR14, . . . ]

BCGGMTV14FL14

KPPSST14FTP14

WSRHBW15BBFR15

CFHKNPZ15CTV15

KZMQCPPsS15D-LFKP16

NT16ZGKPP17

client server

program,inputs

Goal: outsourcing should be less expensivethan just executing the computation

BCGGMTV14FL14

KPPSST14FTP14

WSRHBW15BBFR15

CFHKNPZ15CTV15

KZMQCPPsS15D-LFKP16

NT16ZGKPP17

Do systems achieve this goal?

Verifier: can easily check proof (asymptotically)

Prover

: has massive overhead (≈10,000,000×)

Precomputation

: proportional to computation size

How do systems handle these costs?

Precomputation: amortize over many instances

Prover: assume > 108× cheaper than verifier

Prover: has massive overhead (≈10,000,000×)

Precomputation

: proportional to computation size

Precomputation: proportional to computation size

Our contribution

Giraffe: first system to consider all costs and win.

In Giraffe, P really is 108× cheaper than V!(setting: building trustworthy hardware)

Giraffe extends Zebra [WHGsW, Oakland16] with:• an asymptotically optimal proof protocol that improves on

prior work [Thaler, CRYPTO13]

• a compiler that generates optimized hardware designsfrom a subset of C

Bottom line: Giraffe makes outsourcing worthwhile(. . . sometimes).

Our contribution

Bottom line: Giraffe makes outsourcing worthwhile

(. . . sometimes).

Our contribution

Roadmap

1. Verifiable ASICs

2. Giraffe: a high-level view

3. Evaluation

Roadmap

1. Verifiable ASICs

3. Evaluation

How can we build trustworthy hardware?

Firewall

e.g., a custom chip for network packet processingwhose manufacture we outsource to a third party

What if the chip’s manufacturer inserts a back door?

Threat: incorrect execution of the packet filter

(Other concerns, e.g., secret state, are important but orthogonal)

US DoD controls supply chain with trusted foundries.

Untrusted manufacturers can craft hardware Trojans

Firewall

(Other concerns, e.g., secret state, are important but orthogonal)US DoD controls supply chain with trusted foundries.

Firewall

(Other concerns, e.g., secret state, are important but orthogonal)US DoD controls supply chain with trusted foundries.

Firewall

Trusted fabs are the only way to get strong guarantees

For example, stealthy trojans can thwart post-fab detection[A2: Analog Malicious Hardware, Yang et al., Oakland16;Stealthy Dopant-Level Trojans, Becker et al., CHES13]

But trusted fabrication is not a panacea:

7 Only 5 countries have cutting-edge fabs on-shore

7 Building a new fab takes $$$$$$, years of R&D

7 Semiconductor scaling: chip area and energy go withsquare and cube of transistor length (“critical dimension”)

7 So using an old fab means an enormous performance hite.g., India’s best on-shore fab is 108× behind state of the art

Idea: outsource computations to untrusted chips

Verifiable ASICs [WHGsW16]

Principal

F → designsfor P,V

Untrustedfab (fast)builds P

Trustedfab (slow)builds V

Principal

IntegratorV P

Principal

Integrator

V Pinput

output

Principal

Integrator

proof thaty = F(x)

output

Can Verifiable ASICs be practical?

proof thaty = F(x)

output Fvs.

V overhead: checking proof is cheap

P overhead: high compared to cost of F...

...but P uses an advanced circuit technology

Precomputation: proportional to cost of F

Prior work assumes this away

Prior work:V + P < F

Prior work:V + P + Precomp > F

Our goal:V + P + Precomp < F

proof thaty = F(x)

output Fvs.

proof thaty = F(x)

output Fvs.

proof thaty = F(x)

output Fvs.

proof thaty = F(x)

output Fvs.

proof thaty = F(x)

output Fvs.

proof thaty = F(x)

output Fvs.

Roadmap

1. Verifiable ASICs

3. Evaluation

Evolution of Giraffe’s back-end

GKR08 base protocol

CMT12 reduces P and precomp costs for all ckts

Thaler13 reduces precomp for structured circuits

Giraffe reduces P cost for structured circuits(plus optimizations for V ; see paper)

Let’s take a high-level look at how these optimizations work.(The following all use a nice simplification [Thaler15].)

GKR08 base protocol

GKR08 (a quick reminder)

For each layer of an arithmetic circuit, P and V engage in asum-check protocol.

In the first round, P computes (q ∈ FlogG ):∑h0∈0,1log G

∑h1∈0,1log G

(˜add(q, h0, h1)

(V(h0) + V(h1)

mul(q, h0, h1)(

V(h0) · V(h1)))

This has 22 logG = G 2 terms. In total, P ’s work is O(poly(G )).

Precomputation is one evaluationof ˜add and mul, costing O(poly(G )).

∑h1∈0,1log G

(˜add(q, h0, h1)

(V(h0) + V(h1)

mul(q, h0, h1)(

V(h0) · V(h1)))

∑h1∈0,1log G

(˜add(q, h0, h1)

(V(h0) + V(h1)

mul(q, h0, h1)(

V(h0) · V(h1)))

∑h1∈0,1log G

(˜add(q, h0, h1)

(V(h0) + V(h1)

mul(q, h0, h1)(

V(h0) · V(h1)))

CMT12: from polynomial to quasilinear

add(gO , gL, gR) = 0 except when gO is + with inputs gL, gR

add(3, 2, 3) = 1, otherwise add(· · ·) = 0

in in in in

121665

This means we can rewrite P ’s sum in the first round as:∑(h0,h1)∈Sadd

˜add(q, h0, h1)(

V(h0) + V(h1))

∑(h0,h1)∈Smul

mul(q, h0, h1)(

V(h0) · V(h1))

G terms/round for 2 logG rounds: P ’s work is O(G logG ).

Using a related trick, precomputing˜add and mul costs O(G ) in total.

in in in in

121665

˜add(q, h0, h1)(

V(h0) + V(h1))

∑(h0,h1)∈Smul

mul(q, h0, h1)(

V(h0) · V(h1))

in in in in

121665

˜add(q, h0, h1)(

V(h0) + V(h1))

∑(h0,h1)∈Smul

mul(q, h0, h1)(

V(h0) · V(h1))

in in in in

121665

˜add(q, h0, h1)(

V(h0) + V(h1))

∑(h0,h1)∈Smul

mul(q, h0, h1)(

V(h0) · V(h1))

in in in in

121665

˜add(q, h0, h1)(

V(h0) + V(h1))

∑(h0,h1)∈Smul

mul(q, h0, h1)(

V(h0) · V(h1))

Thaler13: more structure, less precomputation

· · ·GN copies

Idea: for a batch of identical subckts, ˜add and mul can be “small.”

Notice that ˜add does not comprehend subcircuit number!

in in in in

121665

1subckt #0

in in in in

121665

1subckt #1

Ô Precomp costs O(G ), amortized over N copies!

Now P’s sum in the first round is (q′ ∈ FlogN):∑(h0,h1)∈Sadd

˜add(q, h0, h1)∑

h′∈0,1log N

eq(q′, h′

) (V(h′, h0) + V(h′, h1)

∑(h0,h1)∈Smul

mul(q, h0, h1)∑

h′∈0,1log N

eq(q′, h′

) (V(h′, h0) · V(h′, h1)

eq(x , y) = 1 iff x = y

For each gate,

sum over each subcircuit.

NG terms/round in first 2 logG rounds: P’s work is Ω(NG logG ).

· · ·GN copies

in in in in

121665

1subckt #0

in in in in

121665

1subckt #1

˜add(q, h0, h1)∑

h′∈0,1log N

eq(q′, h′

) (V(h′, h0) + V(h′, h1)

∑(h0,h1)∈Smul

mul(q, h0, h1)∑

h′∈0,1log N

eq(q′, h′

) (V(h′, h0) · V(h′, h1)

eq(x , y) = 1 iff x = y

For each gate,

· · ·GN copies

in in in in

121665

1subckt #0

in in in in

121665

1subckt #1

˜add(q, h0, h1)∑

h′∈0,1log N

eq(q′, h′

) (V(h′, h0) + V(h′, h1)

∑(h0,h1)∈Smul

mul(q, h0, h1)∑

h′∈0,1log N

eq(q′, h′

) (V(h′, h0) · V(h′, h1)

eq(x , y) = 1 iff x = y

For each gate,

· · ·GN copies

in in in in

121665

1subckt #0

in in in in

121665

1subckt #1

˜add(q, h0, h1)∑

h′∈0,1log N

eq(q′, h′

) (V(h′, h0) + V(h′, h1)

∑(h0,h1)∈Smul

mul(q, h0, h1)∑

h′∈0,1log N

eq(q′, h′

) (V(h′, h0) · V(h′, h1)

eq(x , y) = 1 iff x = y

For each gate,

· · ·GN copies

in in in in

121665

1subckt #0

in in in in

121665

1subckt #1

˜add(q, h0, h1)∑

h′∈0,1log N

eq(q′, h′

) (V(h′, h0) + V(h′, h1)

∑(h0,h1)∈Smul

mul(q, h0, h1)∑

h′∈0,1log N

eq(q′, h′

) (V(h′, h0) · V(h′, h1)

eq(x , y) = 1 iff x = y

For each gate,

· · ·GN copies

in in in in

121665

1subckt #0

in in in in

121665

1subckt #1

˜add(q, h0, h1)∑

h′∈0,1log N

eq(q′, h′

) (V(h′, h0) + V(h′, h1)

∑(h0,h1)∈Smul

mul(q, h0, h1)∑

h′∈0,1log N

eq(q′, h′

) (V(h′, h0) · V(h′, h1)

eq(x , y) = 1 iff x = y

For each gate,

· · ·GN copies

in in in in

121665

1subckt #0

in in in in

121665

1subckt #1

˜add(q, h0, h1)∑

h′∈0,1log N

eq(q′, h′

) (V(h′, h0) + V(h′, h1)

∑(h0,h1)∈Smul

mul(q, h0, h1)∑

h′∈0,1log N

eq(q′, h′

) (V(h′, h0) · V(h′, h1)

eq(x , y) = 1 iff x = y

For each gate, sum over each subcircuit.

· · ·GN copies

in in in in

121665

1subckt #0

in in in in

121665

1subckt #1

˜add(q, h0, h1)∑

h′∈0,1log N

eq(q′, h′

) (V(h′, h0) + V(h′, h1)

∑(h0,h1)∈Smul

mul(q, h0, h1)∑

h′∈0,1log N

eq(q′, h′

) (V(h′, h0) · V(h′, h1)

eq(x , y) = 1 iff x = y

For each gate,

Giraffe: leveraging structure to reduce P costs

· · ·GN copies

Idea: arrange for copies to “collapse” during sum-check protocol.

Rewriting the prior sum and changing sumcheck order:∑h′∈0,1log N

eq(q′, h′

) ∑(h0,h1)∈Sadd

˜add(q, h0, h1)(

V(h′, h0) + V(h′, h1))

∑h′∈0,1log N

eq(q′, h′

) ∑(h0,h1)∈Smul

mul(q, h0, h1)(

V(h′, h0) · V(h′, h1))

For each subcircuit,

sum over each gate.

In round 1, h′ ∈ 0, 1logNIn round 2, h′ ∈ 0, 1logN−1In round 3, h′ ∈ 0, 1logN−2

P does(N + N

2 + N4 + ...

)G + 2G logG = O(NG + G logG ) work.

Ô Linear in size of computation when N > logG !

· · ·GN copies

eq(q′, h′

) ∑(h0,h1)∈Sadd

˜add(q, h0, h1)(

V(h′, h0) + V(h′, h1))

∑h′∈0,1log N

eq(q′, h′

) ∑(h0,h1)∈Smul

mul(q, h0, h1)(

V(h′, h0) · V(h′, h1))

sum over each gate.

P does(N + N

2 + N4 + ...

· · ·GN copies

eq(q′, h′

) ∑(h0,h1)∈Sadd

˜add(q, h0, h1)(

V(h′, h0) + V(h′, h1))

∑h′∈0,1log N

eq(q′, h′

) ∑(h0,h1)∈Smul

mul(q, h0, h1)(

V(h′, h0) · V(h′, h1))

sum over each gate.In round 1, h′ ∈ 0, 1logNIn round 2, h′ ∈ 0, 1logN−1In round 3, h′ ∈ 0, 1logN−2

P does(N + N

2 + N4 + ...

· · ·GN copies

eq(q′, h′

) ∑(h0,h1)∈Sadd

˜add(q, h0, h1)(

V(h′, h0) + V(h′, h1))

∑h′∈0,1log N

eq(q′, h′

) ∑(h0,h1)∈Smul

mul(q, h0, h1)(

V(h′, h0) · V(h′, h1))

For each subcircuit, sum over each gate.

P does(N + N

2 + N4 + ...

· · ·GN copies

eq(q′, h′

) ∑(h0,h1)∈Sadd

˜add(q, h0, h1)(

V(h′, h0) + V(h′, h1))

∑h′∈0,1log N

eq(q′, h′

) ∑(h0,h1)∈Smul

mul(q, h0, h1)(

V(h′, h0) · V(h′, h1))

sum over each gate.

In round 1, h′ ∈ 0, 1logN

In round 2, h′ ∈ 0, 1logN−1In round 3, h′ ∈ 0, 1logN−2

P does(N + N

2 + N4 + ...

· · ·GN copies

eq(q′, h′

) ∑(h0,h1)∈Sadd

˜add(q, h0, h1)(

V(h′, h0) + V(h′, h1))

∑h′∈0,1log N

eq(q′, h′

) ∑(h0,h1)∈Smul

mul(q, h0, h1)(

V(h′, h0) · V(h′, h1))

sum over each gate.

In round 1, h′ ∈ 0, 1logN

In round 2, h′ ∈ 0, 1logN−1

P does(N + N

2 + N4 + ...

· · ·GN copies

eq(q′, h′

) ∑(h0,h1)∈Sadd

˜add(q, h0, h1)(

V(h′, h0) + V(h′, h1))

∑h′∈0,1log N

eq(q′, h′

) ∑(h0,h1)∈Smul

mul(q, h0, h1)(

V(h′, h0) · V(h′, h1))

sum over each gate.

In round 1, h′ ∈ 0, 1logNIn round 2, h′ ∈ 0, 1logN−1

P does(N + N

2 + N4 + ...

· · ·GN copies

eq(q′, h′

) ∑(h0,h1)∈Sadd

˜add(q, h0, h1)(

V(h′, h0) + V(h′, h1))

∑h′∈0,1log N

eq(q′, h′

) ∑(h0,h1)∈Smul

mul(q, h0, h1)(

V(h′, h0) · V(h′, h1))

sum over each gate.

P does(N + N

2 + N4 + ...

· · ·GN copies

eq(q′, h′

) ∑(h0,h1)∈Sadd

˜add(q, h0, h1)(

V(h′, h0) + V(h′, h1))

∑h′∈0,1log N

eq(q′, h′

) ∑(h0,h1)∈Smul

mul(q, h0, h1)(

V(h′, h0) · V(h′, h1))

sum over each gate.

P does(N + N

2 + N4 + ...

Roadmap

1. Verifiable ASICs

3. Evaluation

Implementation

Giraffe is an end-to-end hardware generator:

a hardware design template

given computation, chip parameters (technology, size, . . . ),produces optimized hardware designs for P and V

a (subset of) C compiler

produces the representation used by the design template

Implementation

a hardware design templategiven computation, chip parameters (technology, size, . . . ),produces optimized hardware designs for P and V

a (subset of) C compiler

produces the representation used by the design template

Implementation

a hardware design templategiven computation, chip parameters (technology, size, . . . ),produces optimized hardware designs for P and V

a (subset of) C compilerproduces the representation used by the design template

Evaluation questions

How does Giraffe perform on real-world computations?

1. Curve25519 point multiplication

2. Image matching

Goal: total cost of V , P , and precomputationshould be less than building F on a trusted chip

Evaluation questions

How does Giraffe perform on real-world computations?

1. Curve25519 point multiplication

2. Image matching

Goal: total cost of V , P , and precomputationshould be less than building F on a trusted chip

Evaluation method

proof thaty = F(x)

output Fvs.

Baselines: Zebra; implementation of F in same technology as V

Metric: total energy consumption

Measurements: based on circuit synthesis and simulation,published chip designs, and CMOS scaling models

Charge for V, P, communication; precomputation; PRNG

Constraints: trusted fab = 350 nm; untrusted fab = 7 nm;200 mm2 max chip area; 150 W max total power

350 nm: 1997 (Pentium II)7 nm: ≈ 2018≈ 20 year gap betweentrusted and untrusted fab

Evaluation method

proof thaty = F(x)

output Fvs.

Evaluation method

proof thaty = F(x)

output Fvs.

Evaluation method

proof thaty = F(x)

output Fvs.

Application #1: Curve25519 point multiplication

Curve25519: a commonly-used elliptic curve

Point multiplication: primitive, e.g., for ECDH

Application #1: Curve25519 point multiplication

Energy consumption, Joules

1 3 5 7 9 11 13 15log2 N , number of copies of subcircuit

Native

Giraffe

Application #2: Image matching

Image matching via Fast Fourier transform

C implementation, compiled by Giraffe’s front-endto V and P hardware designs—no hand tweaking!

Application #2: Image matching

Energy consumption, Joules

3 5 7 9 11 13 15log2 N , number of copies of subcircuit

Native

Giraffe

Recap: is it practical?

proof thaty = F(x)

output

7 Giraffe is restricted to batched computations

Giraffe’s front-end includes two static analysis passes:

Slicing extracts only the parts of programs thatcan be efficiently outsourcedSquashing extracts batch-parallelism from serialcomputations

3 Giraffe’s proof protcol and optimizations saveorders of magnitude compared to prior work

3 Giraffe is the first system in the literature toaccount for all costs—and win.

Giraffe is a step, but much work remains!

https://giraffe.crypto.fyihttp://www.pepper-project.org

proof thaty = F(x)

output

proof thaty = F(x)

output

proof thaty = F(x)

output

proof thaty = F(x)

output

proof thaty = F(x)

output

proof thaty = F(x)

output

Full accounting for verifiable...

Documents

Transcript of Full accounting for verifiable...

Verifiable Cost Working Group (VCWG) Update

Making verifiable computation a systems problem

Verifiable and Redactable Medical Documents

The Round Complexity of Verifiable Secret Sharing Re-Visited€¦ · The Round Complexity of Verifiable Secret Sharing Re-Visited CRYPTO 2009 ... The round complexity of verifiable

Lesson 1.6 Paragraph Proofs Objective: Write paragraph proofs.

Voting for everyone: accessible+verifiable

BUILDING PROOFS - University of Iowahomepage.cs.uiowa.edu/~oliveira/a-guide-to-proofs-coverpage.pdfBUILDING PROOFS A Practical Guide Suely Oliveira David Stewart BUILDING PROOFS A

1. PROOFS 1. PROOFS - evangelicalbible.com

Verifiable Mixnets

Proof Sketches: Verifiable In-Network Aggregation

Constrained Verifiable Random Functions

Proofs The Well Ordering Principle...•Proofs by Contraposition •Proofs by Contradiction •Proof by Cases •Existence Proofs •Counterexample •Conjectures Proofs The Well Ordering

Verifiable Computer Security and Hardware: Issues

PROUD : verifiable privacy-preserving outsourced attribute ...

Verifiable ASICs - Aarhus Universitet · [A2: Analog Malicious Hardware. Yang et al., IEEE S&P 2016] We thought: probabilistic proofs might let us get trust more cheaply! An alternative:

Kinetic: Verifiable Dynamic Network Control

Short PCPs verifiable in Polylogarithmic Time

A Verifiable Random Beacon

Helios: web-based truly verifiable voting

A verifiable random function with short proofs and keys