Java day Big Data Analysis In Java World

29
Big Data Analysis in Java World by Serhiy Masyutin

Transcript of Java day Big Data Analysis In Java World

Page 1: Java day   Big Data Analysis In Java World

Big Data Analysis in Java Worldby Serhiy Masyutin

Page 2: Java day   Big Data Analysis In Java World

Agenda

The Big Data Problem Map-Reduce MPP Analytical Database In-Memory Data Fabric Lambda Architecture Q&A

Page 3: Java day   Big Data Analysis In Java World

The Big Data Problem

- Doug Laney

Big Data

Page 4: Java day   Big Data Analysis In Java World

The Big Data Problem

Map-Reduce MPP AD IMDF

When do I need it?

In an hour In a minute Now

What do I need to do with it?

Exploratory analytics

Structured analytics

Singular event processing

(some analytics),

Transactions

How will I query and search?

Unstructured Ad hoc SQL Structured

How do I need to store it?

I do, but not required to

I must and I am required to

Temporarily

Where is it coming from?

File/ETL File/ETL Event/Stream/File/

ETLhttp://blog.pivotal.io/pivotal/products/exploring-big-data-solutions-when-to-use-hadoop-vs-in-memory-vs-mpp

Page 5: Java day   Big Data Analysis In Java World

The Big Data Problem

Map-Reduce

MPP AD IMDF

Transactions

Customer records

Geo-spatial

Sensors

Social Media

XML, JSON

Raw Logs

Text

Image

Video

more

pro

cessin

g

http://blog.pivotal.io/big-data-pivotal/products/exploratory-data-science-when-to-use-an-mpp-database-sql-on-hadoop-or-map-reduce

Page 6: Java day   Big Data Analysis In Java World

The Big Data Problem

Data is not Information

- Clifford Stoll

Page 7: Java day   Big Data Analysis In Java World

Map-Reduce

http://jeremykun.files.wordpress.com/2014/10/mapreduceimage.gif?w=1800

CPUs aren’t getting faster

Page 8: Java day   Big Data Analysis In Java World

Map-Reduce

https://anonymousbi.files.wordpress.com/2012/11/hadoopdiagram.png

Page 9: Java day   Big Data Analysis In Java World

Map-Reduce

http://hadoop.apache.org/docs/r1.2.1/images/hdfsarchitecture.gif

Page 10: Java day   Big Data Analysis In Java World

CAP Theorem

http://hadoop.apache.org/docs/r1.2.1/images/hdfsarchitecture.gif

Availability

Partition Tolerance

Page 11: Java day   Big Data Analysis In Java World

Map-Reduce

http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Example:_WordCount_v1.0

public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1);}

Page 12: Java day   Big Data Analysis In Java World

Map-Reduce

Volume Variety VelocityMedium-

LargeUnstructure

d dataBatch

processing

Availability

Partition

Tolerance

Page 13: Java day   Big Data Analysis In Java World

MPP Analytical Database

http://www.ndm.net/datawarehouse/images/stories/greenplum/gp-dia-3-0.png

Page 14: Java day   Big Data Analysis In Java World

MPP Analytical Database

http://my.vertica.com/docs/7.1.x/HTML/Content/Resources/Images/K-SafetyServerDiagram.png

Page 15: Java day   Big Data Analysis In Java World

MPP Analytical Database

http://my.vertica.com/docs/7.1.x/HTML/Content/Resources/Images/K-SafetyServerDiagramOneNodeDown.png

Page 16: Java day   Big Data Analysis In Java World

MPP Analytical Database

http://my.vertica.com/docs/7.1.x/HTML/Content/Resources/Images/K-SafetyServerDiagramTwoNodesDown.png

Page 17: Java day   Big Data Analysis In Java World

MPP Analytical Database

http://my.vertica.com/docs/7.1.x/HTML/Content/Resources/Images/DataK-Safety-K2Nodes2And3Failed.png

Page 18: Java day   Big Data Analysis In Java World

MPP Analytical Databasepublic static void main(String[] args) { Class.forName("com.vertica.jdbc.Driver"); ... String connectionUrl = "jdbc:vertica://VerticaHost:5433/ExampleDB"; try (Connection con = DriverManager.getConnection(connectionUrl)) { String sql = "SELECT id, username FROM users WHERE id = ?"; PreparedStatement ps = con.prepareStatement(sql);) { ... try (ResultSet rs = ps.executeQuery()) { while(rs.next()) { ... } } } catch (SQLException e) { ... } }

Page 19: Java day   Big Data Analysis In Java World

MPP Analytical Database

Volume Variety VelocitySmall-

Medium-Large

Structured data

Interactive

ASTER DATABASE

Matrix

Availability

Partition

Tolerance

Page 20: Java day   Big Data Analysis In Java World

In-Memory Data Fabric

https://ignite.incubator.apache.org/images/in_memory_data.png

Page 21: Java day   Big Data Analysis In Java World

In-Memory Data Fabric

https://ignite.incubator.apache.org/images/in_memory_data.png

Page 22: Java day   Big Data Analysis In Java World

In-Memory Data Fabric

https://ignite.incubator.apache.org/images/in_memory_compute.png

Page 23: Java day   Big Data Analysis In Java World

In-Memory Data Fabric

public static void main(String[] args) { HazelcastInstance instance = Hazelcast.newHazelcastInstance();

Map<String, User> loggedOnUsers = instance.getMap("Users"); ... loggedOnUsers.put(username, user); ... if (loggedOnUsers.containsKey(username)) ... ... loggedOnUsers.remove(username); ... for (User u : loggedOnUsers.values()) ...}

Page 24: Java day   Big Data Analysis In Java World

In-Memory Data Fabric

Volume Variety VelocitySmall-

MediumStructured

data(Near) Real-

Time

Availability

Partition

Tolerance

Page 25: Java day   Big Data Analysis In Java World

Lambda Architecture

http://lambda-architecture.net

Page 26: Java day   Big Data Analysis In Java World

Lambda Architecture

http://lambda-architecture.net

SQL

Streaming

ElephantDB

Page 27: Java day   Big Data Analysis In Java World

Lambda Architecture

Volume Variety VelocitySmall-

Medium-Large

Unstructured-

Structured data

(Near) Real-Time

Page 28: Java day   Big Data Analysis In Java World

Q&A

Page 29: Java day   Big Data Analysis In Java World

Thanks folks!