COBOL to Apache Spark

Oct 28, 2017

Ville Misaki

System Strategy Department,

Rakuten Card Co., Ltd.

2

Ville Misaki

Senior Software Engineer

Technology Strategy Group,

System Strategy Department,

Rakuten Card Co., Ltd

Career

15+ years; 3 years at Rakuten

In Finland, the Netherlands, Japan

Java (EE), Perl, C++, web systems, relational

databases, performance optimization & security

3

Oracle OpenWorld 2017

Case Study: Credit Card Core System

with Exalogic, Exadata, Oracle Cloud

Machine (CON4994) => Link

JavaOne 2017

Java EE 7 with Apache Spark for the

World’s Largest Credit Card Core

Systems (CON4998) => Link

https://www.slideshare.net/iwasakihirofumi/case-study-credit-card-core-system-with-exalogic-exadata-oracle-cloud-machine-con4994

https://www.slideshare.net/iwasakihirofumi/java-ee-7-with-apache-spark-for-the-worlds-largest-credit-card-core-systems-con4998

4

Part 1 – Perfect Design

1. About Rakuten Card

2. Background

3. Platform Migration

4. Data Migration

5. Software Migration

Part 2 – Harsh Reliability

6. Performance

7. Apache Spark

8. Judgement Day

9. Into the Future

6

Unified brand, ecosystems around the world.

7

Top-level credit card

company in Japan

Core of Rakuten eco

systems.

3rd position of total

transaction volume in 2016.

Growing rapidly.

9

Core Systems

Web Systems

External Systems

Intra Systems

10

Mainframe

Old architecture – >20 years

High cost

Limited capacity and

performance

Low maintainability

Vendor locked-in

Limited security

For more details, check session

“From Mainframe to Java EE” at

16:00 today

11

Phase of the improvement – 3.0

1.0 Initial phase

2.0 In-house

development

3.0 Standardization

Outsource based,

just started.

Vendor locked-in.

In-house

development,

differentiate with

lower costs and

faster delivery.

Standardized

system

architecture, both

for hardware and

software.

Achieved

Current Standard

Architecture

13

Oracle Exalogic

+ Exadata + ZFS Servers

Mainframe

Old New

Core

Systems

14

Financial de-facto standard

Java EE compliant.

Matured, from 1997.

Financial de-facto standard

ISO/IEC 9075 SQL compliant

Matured, from 1983.

COBOL

Network

DB

App Server

Database

Old New

WebLogic Server

Oracle Database

16

ISAM

VSAMNDB Oracle Database

Copy & Convert

17

Data Conversion

Network database to relational database

ISAM/VSAM data to relational database

Legacy Japanese character set to Unicode

Fix data inconsistencies

Scale

Terabytes of live production data

Less than 24 hours time

18

Offline migration

Freeze data during migration

Full migration – not incremental

Customers mostly unaffected

Data & System migration

At the same time

Cannot be split into phases

Cached

19

ISAM

VSAMNDB Oracle DatabaseISAM

VSAMNDB

Mirror

Copy & Convert

Replication

21

Req.

Source code

Appliction

Platform

Hardware

Reimplement

Convert

Emulate

22

Reimplement Emulate Convert

Pro

• Optimal performance

• Low maintenance cost

• Development unchanged

• Easy to test

• Easy to migrate

• Flexible cost vs. schedule

• Case-by-case fixes

• Easy to test

Con

• Expensive

• Takes a long time

• Risky

• Difficult to test


• Low performance

• Future questionable

• Legacy code remains

• Low performance points

need to be addressed

Requirements?

23


Pro




• Easy to test

• Easy to migrate



• Easy to test

Con

• Expensive


• Risky

• Difficult migration


• Low performance





2x Performance No regression Minimal downtime

24


Pro




• Easy to test

• Easy to migrate



• Easy to test

Con

• Expensive


• Risky

• Difficult migration


• Low performance





2x Performance No regression Minimal downtime

25

Japanese COBOL

Source code

Java Source code

Customized

source code

converter

Convert from Japanese

COBOL to Java EE

Keep original core

business logic

26

JavaFrom Web Systems,

For New Logic

COBOLFrom Old System,

converted to Java

Ease of migration, resource re-use

Introduce power of Java EE

Introduce converter from YPS to Java

“Dual Source Architecture”

Japanese

COBOL

Japanese source code

Almost abandoned

No books, no community

Old New

27

New Logic

(Java EE)

Application Server

(Java EE)

Legacy Logic

(Mainframe)

Build

Deploy

Japanese

COBOL

Convert to

COBOL

Convert

to Java

COBOL

Java

Compile

WAR

Converter

Two sources,

single binary

Easy to operate

Java

Byte Code

Compile

Java

28

BIG

-IP

Real-time Servers

(WebLogic)

Batch Servers

(Spark & Java)

Façade

Rich clients Façade

Façade

Intranet

External

Intra

Exadata

Mail

Form

BIG

-IP

Façade

BIG

-IPExternal

customers

Scheduler

Core

Busin

ess L

ogic

AP

Is

Operation

terminal

Web

bro

wse

r

Old New

29

Part 1 – Perfect Design

1. About Rakuten Card

2. Background

3. Platform Migration

4. Data Migration

5. Software Migration

Part 2 – Harsh Reliability

6. Performance

7. Apache Spark

8. Judgement Day

9. Into the Future

31

vs.

32

vs.

33

Start

Slow

Slow

Batches are run as networks

Hierarchical

Critical path

Time window

34

Automatic code conversion

COBOL program flow emulated in Java

COBOL-like data structures in Java

DB access logic

Business logic built on network DB

NDB and RDB are good at different tasks

35

COBOL vs. Java

Goto statement – imitation is complex

Sub-program calls – heavy

No local variables – tight coupling

No libraries – copy&paste code

Few shared data structures – copy&paste definition

No shared enum/constant – magic numbers

36

COBOL data structures

Fixed length – hard-coded

String-based

Data block inside program

Often thousands of fields

Hierarchical fields

Content is joined/split automatically

Variable namespace under each parent

Even five levels deep

38

Logic optimized for NDB

Read sequentially

Data pre-sorted

Data pre-formatted

Emulate in RDB

Uphill battle

NDB RDB

Search Slow Fast

Sequential Access Fast Slow

Sorting Slow Fast

Formatting Fast Slow

39

New system must be faster

Time until launch:

1 year

40

Options?

Redesign and re-implement from scratch

Not feasible

Optimize framework

Limited effectiveness

Parallelize batches

Elastic brute-force

42

Time

Sequential

Parallel

43

Cluster Node

Cluster Node

Cluster Node

Cluster Node

Cluster Node

Bootstrap

Scheduler

Cluster Node

Share

d M

em

ory

44

1. Making business logic parallel

Independent processing

2. I/O

Data transferred over network

3. Data ordering

Shuffles

45

Problem: input data rows are not independent!

Red flags

Fields not initialized for each row

Code forks early (header & data?)

Legacy code analysis

Refactor

Fields to local variables

Extract data structures

Initialize data for each row

Run & see

321

3

2

1 Reference?

46

1. Group related rows together

2. Process header rows separately

3. Modify business logic

47

Group related rows together Custom data reader

Multiple rows behave like one row

Process each group row in a loop, on

the same node

Pro

Business logic not modified

Con

Relationships may be too complex

Groups may grow too big

ID Data

1 …

1 …

2 …

3 …

3 …

4 …

48

Process header rows separately Run business logic for header rows first

Collect result in NavigableMap

Run business logic for data rows

Initialize data from previous header

floorKey(dataRowIndex)

Pro

Minimal changes to business logic

Con

Relationships may be too complex

ID Type Data

1 Head …

1 Data …

1 Data …

2 Head …

2 Data …

3 Head …

3 Data …

49

Modify business logic Row relationship could be removed, if it’s

Unintentional (a bug)

For unnecessary optimization

Data that could be retrieved otherwise

Pro

High chance for good performance

Con

High chance for new bugs

50

Input and output data must be shared

Network storage

How long does it take to copy 200 GB?

Transfer

Process

Transfer

Process

Transfer

Heavy

Process

Heavy

ProcessTransfer

Transfer Process

51

Sequential batches rely on ordering

Tricky to keep in Spark

Safe operations: map, filter, zip

Unsafe operations: join, group, sort

Process

Process

Process

Process

Process

Process

Shuffle

Process

Process

Process

Shuffle

52

Good for

Heavy processing

Independent input data records

One input, multiple outputs

Unordered data

Not so great for

Little processing

Dependencies between data records

Merging multiple data sources

55

321

321Data

Saturday Sunday Monday

56

vs.

58

Next Phase

1.0 Initial phase

2.0 In-house

development

3.0 Standardization

4.0Data Optimized

Outsource based,

just started.

Vendor locked-in.

In-house

development,

differentiate with

lower costs and

faster delivery.

Standardized

system

architecture, both

for hardware and

software.

Overwhelming

differentiation,

with enabling

architecture for

customer centric

service.

Achieved Next

Current Standard

Architecture

COBOL to Apache Spark

Technology

Transcript of COBOL to Apache Spark