Big Data Little Tests - Agile Alliance...Big Data! Little Tests" " John Heintz" Founder, Gist Labs"...

Post on 24-Jun-2020

0 views 0 download

Transcript of Big Data Little Tests - Agile Alliance...Big Data! Little Tests" " John Heintz" Founder, Gist Labs"...

Big Data���Little Tests

John Heintz

Founder, Gist Labs Technical Consultant, Cutter Consortium

john@gistlabs.com @jheintz

http://gistlabs.com

© 2012 Gist Labs, LLC

About John Heintz •  Developer since 1995

•  Agilist since 1999

•  Founded Gist Labs in 2008

•  Developer, Mentor, Consultant

•  Intuitive, Abstract, Precise

2

Kool-Aids I’ve drank: Agile/Lean/Kanban, OO, TDD, REST, Mentoring, Craftsmanship, Emergent/Progressive Design, InnovationGames®, Systems and Complexity Theory

© 2012 Gist Labs, LLC

My Goals for You

•  Demystify test automation for Big Data

•  Provide executable examples

3

© 2012 Gist Labs, LLC

What you shouldn’t expect…

•  Barely introduce Big Data concepts

• No performance tuning

4

© 2012 Gist Labs, LLC

Simple Code, Config

•  I went as simple and clear as possible

•  Java, JUnit4

• Maven… okay maybe not simple :-\

5

© 2012 Gist Labs, LLC

Mostly Code

•  Remember the Law of Two Feet

•  If code isn’t what you were looking for I totally respect you finding something better for your time J

6

© 2012 Gist Labs, LLC

•  Everything available from http://gistlabs.com/2012/08/big-data-little-tests/

•  The entire command script is there…

so you can take notes assuming that’s available

7

© 2012 Gist Labs, LLC

My Soapboxes…

These are topics I’ll repeat myself on

•  Fast test execution

• One-click build

8

© 2012 Gist Labs, LLC

Big Data

•  Too much

•  Too fast

• Not trivially structured

9

© 2012 Gist Labs, LLC

Map Reduce

• Map from one input to one output

•  Reduce from many inputs to one output

•  Can be run in parallel

•  Crude, but massive

10

© 2012 Gist Labs, LLC

CAP Theorem

•  Consistency

•  Availability

•  Partition Tolerance

11

© 2012 Gist Labs, LLC

Big Data Ecosystem

•  Hadoop: A giant among giants

(Tons of projects on this platform!!)

•  Cassandra: Feels like a weird RDBMS

•  Riak: An elegant key/value/search store

• MongoDB: Document store

12

© 2012 Gist Labs, LLC

Let’s Run Some Code

13

© 2012 Gist Labs, LLC

Hadoop Tests

14

© 2012 Gist Labs, LLC

Riak tests

15

© 2012 Gist Labs, LLC

Other Frameworks

•  CassandraUnit

https://github.com/jsevellec/cassandra-unit

•  PigUnit, Hadoop Query Language

http://pig.apache.org/docs/r0.8.1/pigunit.html

16

© 2012 Gist Labs, LLC

Code Questions?

•  Fast test execution?

• One-click build?

17

© 2012 Gist Labs, LLC

What about Big Tests?

•  Real test data

•  Realistic cluster

18

© 2012 Gist Labs, LLC

Real Test Data

My favorite strategy is to:

•  Develop with small, crafted data

•  Build/test the same way

•  Run another test on top of real prod data

19

© 2012 Gist Labs, LLC

Continuous Deployment Servers

Build

Cluster

Test1

Cluster

Version Control

Staging

Production

Continuous Integration Servers

Developers

Developers

Test2

Cluster

Virtual vs Physical Servers

Network Infrastructure

Storage Infrastructure

Developer Sandboxes

Self-service Provisioning

Private vs Public Cloud

20

© 2012 Gist Labs, LLC

Realistic Cluster

•  Use a CI/DevOps environment

•  Virtualize, “X as a Service”

•  Virtual Machines

•  Virtual Infrastructure (Network, Storage)

21

© 2012 Gist Labs, LLC

Jenkins CI Server • Master/slave clusters

•  Plugins for Hadoop and VMWare

•  http://jenkins-ci.org/

22

© 2012 Gist Labs, LLC

Big Questions?

23

© 2012 Gist Labs, LLC

Thank you!

•  Everything available from:

http://gistlabs.com/2012/08/big-data-little-tests/

•  John Heintz, @jheintz, http://gistlabs.com

24