Java one 2015 [con3339]

46
Real-World Batch Processing with Java EE [CON3339] Arshal Ameen (@AforArsh) Hirofumi Iwasaki (@HirofumiIwasaki) Financial Services Department, Rakuten, Inc.

Transcript of Java one 2015 [con3339]

Page 1: Java one 2015 [con3339]

Real-World Batch Processing with Java EE [CON3339]

Arshal Ameen (@AforArsh) Hirofumi Iwasaki (@HirofumiIwasaki)Financial Services Department, Rakuten, Inc.

Page 2: Java one 2015 [con3339]

2

AgendaWhat’s Batch ?

History of batch frameworks

Types of batch frameworks

Best practices

Demo

Conclusion

Page 3: Java one 2015 [con3339]

3

“Batch”

Batch processing is the execution of a series of programs ("jobs") on a computer without manual intervention.

Jobs are set up so they can be run to completion without human interaction. All input parameters are predefined through scripts, command-line arguments, control files, or job control language. This is in contrast to "online" or interactive programs which prompt the user for such input. A program takes a set of data files as input, processes the data, and produces a set of output data files.

- From Wikipedia

Page 4: Java one 2015 [con3339]

4

Batch vs Real-time

Batch

Real-time

Short Running(nanosecond - second)

Long Running(minutes - hours)

JSFEJBetc.

JBatch (JSR 352)EJBPOJOetc.

Sometimes“job net” or“job stream” reconfigurationrequired

Fixed atdeploy

Immediately

Per sec, minutes,hours, days,weeks, months, etc.

Page 5: Java one 2015 [con3339]

5

Batch vs Real-time Details

Trigger UI support Availability Input data Transaction time

Transaction cycle

Batch Scheduler Optional Normal Small -Large

Minutes, hours, days, weeks…

Bulk (chunk)operation

Real-time Ondemand

SometimesUI needed

High Small ns, ms, s Per item

Page 6: Java one 2015 [con3339]

6

Batch app categories

• Records or values are retrieved from files

File driven

• Rows or values are retrieved from file

Database driven

• Messages are retrieved from a message queue

Message driven

Combination

Page 7: Java one 2015 [con3339]

7

Batch procedure

Stream

Job AInput A

Process A

Output A

Job BInput B

Process B

Output B

Job CInput C

Process C

Output C …

“Job Net” or “Job Stream”,comes from JCL era. (JCL itself doesn’t provide it)

Card/Step

Page 8: Java one 2015 [con3339]

8

AgendaWhat’s Batch ?

History of batch frameworks

Types of batch frameworks

Best practices

Demo

Conclusion

Page 9: Java one 2015 [con3339]

9

Simple History of Batch Processing in Enterprise

1950 1960 1970 1980 1990 2000 2010

JCLJ2EE

MS-DOSBat

UNIXSh

MainframeCOBOL Java

JSR 352

Java EE

Win NTBat

Bash

C

CP/MSub Power

Shell

FORTLAN

BASICVB C#

PL/IHadoop

Page 10: Java one 2015 [con3339]

10

AgendaWhat’s Batch ?

History of batch frameworks

Types of batch frameworks

Best practices

Demo

Conclusion

Page 11: Java one 2015 [con3339]

11

Super Legacy Batch Script (1960’s – 1990’s)

JCL//ZD2015BZ JOB (ZD201010),'ZD2015BZ',GROUP=PP1,// CLASS=A,MSGCLASS=H,NOTIFY=ZD2015,MSGLEVEL=(1,1)//********************************************************//* Unloading data procedure//********************************************************//UNLDP EXEC PGM=UNLDP,TIME=20//STEPLIB DD DSN=ZD.DBMST.LOAD,DISP=SHR// DD DSN=ZB.PPDBL.LOAD,DISP=SHR// DD DSN=ZA.COBMT.LOAD,DISP=SHR//CPT871I1 DD DSN=P201.IN1,DISP=SHR//CUU091O1 DD DSN=P201.ULO1,DISP=(,CATLG,DELETE),// SPACE=(CYL,(010,10),RLSE),UNIT=SYSDA,// DCB=(RECFM=FB,LRECL=016,BLKSIZE=1600)//SYSOUT DD SYSOUT=*

JES

COBOLCall

Input

Output

Proc

Page 12: Java one 2015 [con3339]

12

Legacy Batch Script (1980’s – 2000’s)

Windows Task Scheduler

command.com Bat FileBash Shell Script

Linux CronCall Call

Page 13: Java one 2015 [con3339]

13

Modern Batch Implementation

or.NET Framework(ignore now)

Page 14: Java one 2015 [con3339]

14

Java Batch Design patterns

1. POJO

2. Custom Framework

3. EJB / CDI

4. EJB with embedded container

5. JSR-352

Page 15: Java one 2015 [con3339]

15

1. POJO Batch with PreparedStatement object

✦ Create connection and SQL statements with placeholders.

✦ Set auto-commit to false using setAutoCommit().

✦ Create PrepareStatement object using either prepareStatement()methods.

✦ Add as many as SQL statements you like into batch using addBatch() method

on created statement object.

✦ Execute SQL statements using executeBatch() method on created statement

object with commit() in every chunk times for changes.

Page 16: Java one 2015 [con3339]

16

1. Batch with PreparedStatement objectConnection conn = DriverManager.getConnection(“jdbc:~~~~~~~”);conn.setAutoCommit(false);String query = "INSERT INTO User(id, first, last, age) "

+ "VALUES(?, ?, ?, ?)";PreparedStatemen pstmt = conn.prepareStatement(query);for(int i = 0; i < userList.size(); i++) {

User usr = userList.get(i);pstmt.setInt(1, usr.getId());pstmt.setString(2, usr.getFirst());pstmt.setString(3, usr.getLast());pstmt.setInt(4, usr.getAge());pstmt.addBatch();if(i % 20 == 0) {

stmt.executeBatch();conn.commit();

}}conn.commit(); ....

ü Most effecient for batch SQL statements.

ü All manual operations.

Page 17: Java one 2015 [con3339]

17

1. Benefits of Prepared Statements

Execution

Planning & Optimization of data retrieval path

Compilation of SQL query

Parsing of SQL query

Execution

Create PreparedStatement

ü Prevents SQL Injection

ü Dynamic queries

ü Faster

ü Object oriented

x FORWARD_ONLY result set

x IN clause limitation

Page 18: Java one 2015 [con3339]

18

2. Custom framework via servlets

Customizability, full-controlPros

Tied to container or framework

Sometimes poor transaction management

Poor job control and monitoring

No standard

Cons

Page 19: Java one 2015 [con3339]

19

3. Batch using EJB or CDI

Java EE App Server

@Stateless / @Dependent

EJB / CDI BatchEJB@Remoteor REST

clientRemoteCall

Database

Input

Output

Job Scheduler

Remotetrigger

OtherSystem

Process

MQ

@Stateless/ @Dependent

EJB / CDI

Use EJB Timer@Schedule to auto-trigger

Page 20: Java one 2015 [con3339]

20

3. Why EJB / CDI?

EJB/CDI

Client

1. Remote Invocation

EJB/CDI

2. Automatic Transaction Management

Database

(BEGIN)

(COMMIT)

EJBonly

EJB EJB

EJBInstancePool

Activate

3. Instance Pooling for Faster Operation

RMI-IIOP (EJB only)SOAPRESTWeb Socket

EJBonly

Client

4. Security Management

Page 21: Java one 2015 [con3339]

21

3. EJB / CDI Prosª Easiest to implement

ª Batch with PreparedStatement in EJB works well in JEE6 for database

batch operations

ª Container managed transaction (CMT) or @Transactional on CDI:

automatic transaction system.

ª EJB has integrated security management

ª EJB has instance pooling: faster business logic execution

Page 22: Java one 2015 [con3339]

22

3. EJB / CDI consª EJB pools are not sized correctly for batch by default

ª Set hard limits for number of batches running at a time

ª CMT / CDI @Transactional is sometimes not efficient for bulk operations;

need to combine custom scoping with “REUIRES_NEW” in transaction type.

ª EJB passivation; they go passive at wrong intervals (on stateful session

bean)

ª JPA Entity Manager and Entities are not efficient for batch operation

ª Memory constraints on session beans: need to be tweaked for larger jobs

ª Abnormal end of batch might shutdown JVM

ª When terminated immediately, app server also gets killed.

Page 23: Java one 2015 [con3339]

23

4. Batch using EJB / CDI on Embedded container

Embedded EJBContainer

@Stateless / @DependentEJB / CDI Batch

Database

Input

Output

Job Scheduler

Remotetrigger

OtherSystem

Process

MQ

Selfboot

Page 24: Java one 2015 [con3339]

24

4. How ?

pom.xml (case of GlassFish)<dependency>

<groupId>org.glassfish.main.extras</groupId> <artifactId>glassfish-embedded-all</artifactId><version>4.1</version><scope>test</scope>

</dependency>

EJB / CDI@Stateless / @Dependent @Transactionalpublic class SampleClass {

public String hello(String message) {return "Hello " + message;

}}

Page 25: Java one 2015 [con3339]

25

4. How (Part 2)JUnit Test Casepublic class SampleClassTest {private static EJBContainer ejbContainer;private static Context ctx;@BeforeClasspublic static void setUpClass() throws Exception {

ejbContainer = EJBContainer.createEJBContainer();ctx = ejbContainer.getContext();

}@AfterClasspublic static void tearDownClass() throws Exception {

ejbContainer.close();}@Testpublic void hello() throws NamingException {

SampleClass sample = (SampleClass) ctx.lookup("java:global/classes/SampleClass");

assertNotNull(sample); assertNotNull(sample.hello("World”););assertTrue(hello.endsWith(expected));

}}

Page 26: Java one 2015 [con3339]

26

4. Should I use embedded container ?

✦ Quick to start (~10s)

✦ Efficient for batch implementations

✦ Embedded container uses lesser disk space and main memory

✦ Allows maximum reusability of enterprise components

✘ Inbound RMI-IIOP calls are not supported (on EJB)

✘ Message-Driven Bean (MDB) are not supported.

✘ Cannot be clustered for high availability

Pros

Cons

Page 27: Java one 2015 [con3339]

27

5. JSR-352

Implement artifacts

Orchestrate execution Execute

Page 28: Java one 2015 [con3339]

28

5. Programming modelª Chunk and Batchlet models

ª Chunk: Reader Processor writer

ª Batchlets: DYOT step, Invoke and return code upon completion, stoppable

ª Contexts: For runtime info and interim data persistence

ª Callback hooks (listeners) for lifecycle events

ª Parallel processing on jobs and steps

ª Flow: one or more steps executed sequentially

ª Split: Collection of concurrently executed flows

ª Partitioning – each step runs on multiple instances with unique properties

Page 29: Java one 2015 [con3339]

29

5. Batch Chunks

Page 30: Java one 2015 [con3339]

30

5. Programming modelª Job operator: job management

ª Job repository

ª JobInstance - basically run()

ª JobExecution - attempt to run()

ª StepExecution - attempt to run() a step in a job

JobOperator jo = BatchRuntime.getJobOperator();long jobId = jo.start(”sample”,new Properties());

Page 31: Java one 2015 [con3339]

31

5. JSR-352

Chunk

Page 32: Java one 2015 [con3339]

32

5. Programming modelª JSL: XML based batch job

Page 33: Java one 2015 [con3339]

33

5. JCL & JSL

JCL JSR 352 “JSL”//ZD2015BZ JOB (ZD201010),'ZD2015BZ',GROUP=PP1,// CLASS=A,MSGCLASS=H,NOTIFY=ZD2015,MSGLEVEL=(1,1)//********************************************************//* Unloading data procedure//********************************************************//UNLDP EXEC PGM=UNLDP,TIME=20//STEPLIB DD DSN=ZD.DBMST.LOAD,DISP=SHR// DD DSN=ZB.PPDBL.LOAD,DISP=SHR// DD DSN=ZA.COBMT.LOAD,DISP=SHR//CPT871I1 DD DSN=P201.IN1,DISP=SHR//CUU091O1 DD DSN=P201.ULO1,DISP=(,CATLG,DELETE),// SPACE=(CYL,(010,10),RLSE),UNIT=SYSDA,// DCB=(RECFM=FB,LRECL=016,BLKSIZE=1600)//SYSOUT DD SYSOUT=*

JES Java EE App Server

1970’s 2010’s

<?xml version="1.0" encoding="UTF-8"?><job id="my-chunk" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0">

<properties><property name="inputFile" value="input.txt"/><property name="outputFile" value="output.txt"/>

</properties><step id="step1">

<chunk item-count="20"><reader ref="myChunkReader"/><processor ref="myChunkProcessor"/><writer ref="myChunkWriter"/>

</chunk></step>

</job>

COBOL JSR 352 Chunk or Batchlet

Input

Output

Proc

Call Call

Page 34: Java one 2015 [con3339]

34

5. Spring 3.0 Batch (JSR-352)

Page 35: Java one 2015 [con3339]

35

5. Spring batchª API for building batch components integrated with Spring framework

ª Implementations for Readers and Writers

ª A SDL (JSL) for configuring batch components

ª Tasklets (Spring batchlet): collections of custom batch steps/tasks

ª Flexibility to define complex steps

ª Job repository implementation

ª Batch processes lifecycle management made a bit more easier

Page 36: Java one 2015 [con3339]

36

5. Main differences

Spring JSR-352

DI Bean definitions Job definiton(optional)

Properties Any type String only

Page 37: Java one 2015 [con3339]

37

Appendix: Apache HadoopApache Hadoop is a scalable storage and batch data processing system.

ª Map Reduce programming model

ª Hassle free parallel job processing

ª Reliable: All blocks are replicated 3 times

ª Databases: built in tools to dump or extract data

ª Fault tolerance through software, self-healing and auto-retry

ª Best for unstructured data (log files, media, documents, graphs)

Page 38: Java one 2015 [con3339]

38

Appendix: Hadoop’s not forª Not for small or real-time data; >1TB is min.

ª Procedure oriented: writing code is painful and error prone. YAGNI

ª Potential stability and security issues

ª Joins of multiple datasets are tricky and slow

ª Cluster management is hard

ª Still single master which requires care and may limit scaling

ª Does not allow for stateful multiple-step processing of records

Page 39: Java one 2015 [con3339]

39

AgendaWhat’s Batch ?

History of batch frameworks

Types of batch frameworks

Best practices

Demo

Conclusion

Page 40: Java one 2015 [con3339]

40

Key points to considerª Business logic

ª Transaction management

ª Exception handling

ª File processing

ª Job control/monitor (retry/restart policies)

ª Memory consumed by job

ª Number of processes

Page 41: Java one 2015 [con3339]

41

Best practicesª Always poll in batches

ª Processor: thread-safe, stateless

ª Throttling policy when using queues

ª Storing results

ª in memory is risky

Page 42: Java one 2015 [con3339]

42

AgendaWhat’s Batch ?

History of batch frameworks

Types of batch frameworks

Best practices

Demo

Conclusion

Page 43: Java one 2015 [con3339]

43

AgendaWhat’s Batch ?

History of batch frameworks

Types of batch frameworks

Best practices

Demo

Conclusion

Page 44: Java one 2015 [con3339]

44

Conclusion: Script vs JavaShell Script Based(Bash, PowerShell, etc.)

Java Based(Java EE, POJO, etc.)

Pros § Super quick to write one§ Easy testing

§ Power of Java APIs or Java EE APIs§ Platform independent§ Accuracy of error handling§ Container transaction management (Java EE)§ Operational management (Java EE)

Cons § Lesser scope of implementation§ No transaction management§ Poor error handling§ Poor operation management

§ Sometimes takes more time to make§ Sometimes difficult to test

Page 45: Java one 2015 [con3339]

45

Conclusion

POJO CustomFramework

EJB / CDI EJB / CDI +Embedded Container

JSR 352

Pros § Quick to write§ Java§ easy testing

§ Depends oneach product

§ Super power of Java EE

§ Standardized

§ Super power of Java EE

§ Standardized§ Easy testing§ Can stop

forcefully

§ Super power of Java EE

§ Standardized§ Easy testing§ Auto chunk,

parallel operations

Cons § No standard§ no transaction

management§ less operation

management

§ No standard§ Depends on

each product

§ Difficult to test§ Cannot stop

forcefully§ No auto chunk

or parallel operations

§ No auto chunk or parallel operations

§ New !§ Cannot stop

immediately in case of chunks

Java EE 7Java EE 6

Page 46: Java one 2015 [con3339]

46

Contact Arshal (@AforArsh)Hirofumi Iwasaki (@HirofumiIwasaki)