Dataflow: the concurrency/parallelism architecture you need

Post on 15-Jan-2015

170 views 0 download

Tags:

description

An informal investigation/tutorial on the dataflow architecture for Java and Groovy as presented at DevoxxUK 2014. Code presented is on GitHub: https://github.com/russel/MeanStdDev.git

Transcript of Dataflow: the concurrency/parallelism architecture you need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow:

Russel Winder@russel_winder http://www.russel.org.ukrussel@winder.org.uk

The Concurrency/ParallelismArchitecture You Need

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

What is Dataflow?

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

What are (in computing†):

Concurrency:

Structuring solution and code such that multiple parts may execute independently and possibly even at the same time.

Parallelism:

Execute multiple parts of a system at the same time on different processors so as to get things working faster.

†In natural language these words have very different meanings.

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

What is Dataflow?

An architecture comprising channels allowing data to flow from one operator to another, where each operator has multiple input channels and multiple output channels, and executes code only in response to the arrival of data on the inputs.

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Historically

Dataflow computers:– Values flowing between…–…operators that calculate…–…new values to pass to…–…other operators.

Dataflow hardware didn't take off, but the architecture works at various scales.

The Manchester Prototype Dataflow Computer J R Gurd, C C Kirkham, I WatsonCACM 28(1), 1985-01.

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow diagrams have been anintegral part of analysis and design ofinformation systems since the 1970s

T de Marco, Structured Analysis and Systems Specification,Yourdon Press, NY, 1978.

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow and Functional

Operators seem like they might be pure functions, but…

…they are not necessarily, operators may have internal state.

Operators may be referentially transparent, but they may be not.

Operators may even have side effects.

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow is anevent-basedarchitecture

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow systems are(possibly)

reactive systems.

Which would make them exceedinglytrendy even if the idea is very old.

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow systems have

no†

shared memory.

† or at least should have no.

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

operatorchannel

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow systems aremessage passing systems.

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Each operator must†

be single threaded.

† or at least should.

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow Frameworks

Scala:–Future

Akka:–Dataflow variables, aka

Promise–Deprecated in favour of Async

Java:–Pre-8, Future–8+, CompletableFuture, aka

Promise

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Architectural Issue

Each of the aforementioned frameworks assumes that each operator creates a single value. Communication is by dataflow variables: each dataflow variable is a thread-safe single assignment variable.

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

GPars…

Has dataflow variables (promises) and tasks and so can do everything Akka and Java can offer.

Has DataflowQueue, and so can create real dataflow networks.

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

One does like to code…

…doesn't one.

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

We need a problem…

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

A Problem

Calculate mean and standard deviation of a data sample.

x̄ =1n∑i=0

nxi

s = √ 1n−1∑i=0

n(x i− x̄)2

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Amend the Problem

s = √ 1n−1 ( (∑i=0

nx i

2 )−n x̄ x̄ )

x̄ =1n∑i=0

nxi

@YourTwitterHandle@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Code

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Switch to using an IDE for this.Switch to using an IDE for this.

Code Example

@YourTwitterHandle#DVXFR14{session hashtag} @russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Sum

mar

y

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Summary

Dataflow is an architecture:

Event-driven, single-threaded operators communicating by message passing using channels.

Dataflow is an easement:

Synchronization is inherent in the model, and there is no shared memory, so all deadlocks are trivial.

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow is a way of harnessingconcurrency and parallelism

in easy to program ways.

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

GPars is usable from Javaas well as Groovy.

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Testing is really Groovy with Spock.

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow is an architecture ofcode you need to know.

@YourTwitterHandle#DVXFR14{session hashtag} @russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Q &

A

@russel_winder#devoxxuk #dataflowrules Copyright © 2014 Russel Winder

Dataflow:

Russel Winder@russel_winder http://www.russel.org.ukrussel@winder.org.uk

The Concurrency/ParallelismArchitecture You Need