COBOL to Apache Spark
-
Upload
rakuten-inc -
Category
Technology
-
view
159 -
download
2
Transcript of COBOL to Apache Spark
Oct 28, 2017
Ville Misaki
System Strategy Department,
Rakuten Card Co., Ltd.
2
Ville Misaki
Senior Software Engineer
Technology Strategy Group,
System Strategy Department,
Rakuten Card Co., Ltd
Career
15+ years; 3 years at Rakuten
In Finland, the Netherlands, Japan
Java (EE), Perl, C++, web systems, relational
databases, performance optimization & security
3
Oracle OpenWorld 2017
Case Study: Credit Card Core System
with Exalogic, Exadata, Oracle Cloud
Machine (CON4994) => Link
JavaOne 2017
Java EE 7 with Apache Spark for the
World’s Largest Credit Card Core
Systems (CON4998) => Link
4
Part 1 – Perfect Design
1. About Rakuten Card
2. Background
3. Platform Migration
4. Data Migration
5. Software Migration
Part 2 – Harsh Reliability
6. Performance
7. Apache Spark
8. Judgement Day
9. Into the Future
5
6
Unified brand, ecosystems around the world.
7
Top-level credit card
company in Japan
Core of Rakuten eco
systems.
3rd position of total
transaction volume in 2016.
Growing rapidly.
8
9
Core Systems
Web Systems
External Systems
Intra Systems
10
Mainframe
Old architecture – >20 years
High cost
Limited capacity and
performance
Low maintainability
Vendor locked-in
Limited security
For more details, check session
“From Mainframe to Java EE” at
16:00 today
11
Phase of the improvement – 3.0
1.0 Initial phase
2.0 In-house
development
3.0 Standardization
Outsource based,
just started.
Vendor locked-in.
In-house
development,
differentiate with
lower costs and
faster delivery.
Standardized
system
architecture, both
for hardware and
software.
Achieved
Current Standard
Architecture
12
13
Oracle Exalogic
+ Exadata + ZFS Servers
Mainframe
Old New
Core
Systems
14
Financial de-facto standard
Java EE compliant.
Matured, from 1997.
Financial de-facto standard
ISO/IEC 9075 SQL compliant
Matured, from 1983.
COBOL
Network
DB
App Server
Database
Old New
WebLogic Server
Oracle Database
15
16
ISAM
VSAMNDB Oracle Database
Copy & Convert
17
Data Conversion
Network database to relational database
ISAM/VSAM data to relational database
Legacy Japanese character set to Unicode
Fix data inconsistencies
Scale
Terabytes of live production data
Less than 24 hours time
18
Offline migration
Freeze data during migration
Full migration – not incremental
Customers mostly unaffected
Data & System migration
At the same time
Cannot be split into phases
Cached
19
ISAM
VSAMNDB Oracle DatabaseISAM
VSAMNDB
Mirror
Copy & Convert
Replication
20
21
Req.
Source code
Appliction
Platform
Hardware
Reimplement
Convert
Emulate
22
Reimplement Emulate Convert
Pro
• Optimal performance
• Low maintenance cost
• Development unchanged
• Easy to test
• Easy to migrate
• Flexible cost vs. schedule
• Case-by-case fixes
• Easy to test
Con
• Expensive
• Takes a long time
• Risky
• Difficult to test
• Development unchanged
• Low performance
• Future questionable
• Legacy code remains
• Low performance points
need to be addressed
Requirements?
23
Reimplement Emulate Convert
Pro
• Optimal performance
• Low maintenance cost
• Development unchanged
• Easy to test
• Easy to migrate
• Flexible cost vs. schedule
• Case-by-case fixes
• Easy to test
Con
• Expensive
• Takes a long time
• Risky
• Difficult migration
• Development unchanged
• Low performance
• Future questionable
• Legacy code remains
• Low performance points
need to be addressed
2x Performance No regression Minimal downtime
24
Reimplement Emulate Convert
Pro
• Optimal performance
• Low maintenance cost
• Development unchanged
• Easy to test
• Easy to migrate
• Flexible cost vs. schedule
• Case-by-case fixes
• Easy to test
Con
• Expensive
• Takes a long time
• Risky
• Difficult migration
• Development unchanged
• Low performance
• Future questionable
• Legacy code remains
• Low performance points
need to be addressed
2x Performance No regression Minimal downtime
25
Japanese COBOL
Source code
Java Source code
Customized
source code
converter
Convert from Japanese
COBOL to Java EE
Keep original core
business logic
26
JavaFrom Web Systems,
For New Logic
COBOLFrom Old System,
converted to Java
Ease of migration, resource re-use
Introduce power of Java EE
Introduce converter from YPS to Java
“Dual Source Architecture”
Japanese
COBOL
Japanese source code
Almost abandoned
No books, no community
Old New
27
New Logic
(Java EE)
Application Server
(Java EE)
Legacy Logic
(Mainframe)
Build
Deploy
Japanese
COBOL
Convert to
COBOL
Convert
to Java
COBOL
Java
Compile
WAR
Converter
Two sources,
single binary
Easy to operate
Java
Byte Code
Compile
Java
28
BIG
-IP
Real-time Servers
(WebLogic)
Batch Servers
(Spark & Java)
Façade
Rich clients Façade
Façade
Intranet
External
Intra
Exadata
Form
BIG
-IP
Façade
BIG
-IPExternal
customers
Scheduler
Core
Busin
ess L
ogic
AP
Is
Operation
terminal
Web
bro
wse
r
Old New
29
Part 1 – Perfect Design
1. About Rakuten Card
2. Background
3. Platform Migration
4. Data Migration
5. Software Migration
Part 2 – Harsh Reliability
6. Performance
7. Apache Spark
8. Judgement Day
9. Into the Future
30
31
vs.
32
vs.
33
Start
Slow
Slow
Batches are run as networks
Hierarchical
Critical path
Time window
34
Automatic code conversion
COBOL program flow emulated in Java
COBOL-like data structures in Java
DB access logic
Business logic built on network DB
NDB and RDB are good at different tasks
35
COBOL vs. Java
Goto statement – imitation is complex
Sub-program calls – heavy
No local variables – tight coupling
No libraries – copy&paste code
Few shared data structures – copy&paste definition
No shared enum/constant – magic numbers
36
COBOL data structures
Fixed length – hard-coded
String-based
Data block inside program
Often thousands of fields
Hierarchical fields
Content is joined/split automatically
Variable namespace under each parent
Even five levels deep
37
38
Logic optimized for NDB
Read sequentially
Data pre-sorted
Data pre-formatted
Emulate in RDB
Uphill battle
NDB RDB
Search Slow Fast
Sequential Access Fast Slow
Sorting Slow Fast
Formatting Fast Slow
39
New system must be faster
Time until launch:
1 year
40
Options?
Redesign and re-implement from scratch
Not feasible
Optimize framework
Limited effectiveness
Parallelize batches
Elastic brute-force
41
42
Time
Sequential
Parallel
43
Cluster Node
Cluster Node
Cluster Node
Cluster Node
Cluster Node
Bootstrap
Scheduler
Cluster Node
Share
d M
em
ory
44
1. Making business logic parallel
Independent processing
2. I/O
Data transferred over network
3. Data ordering
Shuffles
45
Problem: input data rows are not independent!
Red flags
Fields not initialized for each row
Code forks early (header & data?)
Legacy code analysis
Refactor
Fields to local variables
Extract data structures
Initialize data for each row
Run & see
321
3
2
1 Reference?
46
1. Group related rows together
2. Process header rows separately
3. Modify business logic
47
Group related rows together Custom data reader
Multiple rows behave like one row
Process each group row in a loop, on
the same node
Pro
Business logic not modified
Con
Relationships may be too complex
Groups may grow too big
ID Data
1 …
1 …
2 …
3 …
3 …
4 …
48
Process header rows separately Run business logic for header rows first
Collect result in NavigableMap
Run business logic for data rows
Initialize data from previous header
floorKey(dataRowIndex)
Pro
Minimal changes to business logic
Con
Relationships may be too complex
ID Type Data
1 Head …
1 Data …
1 Data …
2 Head …
2 Data …
3 Head …
3 Data …
49
Modify business logic Row relationship could be removed, if it’s
Unintentional (a bug)
For unnecessary optimization
Data that could be retrieved otherwise
Pro
High chance for good performance
Con
High chance for new bugs
50
Input and output data must be shared
Network storage
How long does it take to copy 200 GB?
Transfer
Process
Transfer
Process
Transfer
Heavy
Process
Heavy
ProcessTransfer
Transfer Process
51
Sequential batches rely on ordering
Tricky to keep in Spark
Safe operations: map, filter, zip
Unsafe operations: join, group, sort
Process
Process
Process
Process
Process
Process
Shuffle
Process
Process
Process
Shuffle
52
Good for
Heavy processing
Independent input data records
One input, multiple outputs
Unordered data
Not so great for
Little processing
Dependencies between data records
Merging multiple data sources
53
54
55
321
321Data
Saturday Sunday Monday
56
vs.
57
58
Next Phase
1.0 Initial phase
2.0 In-house
development
3.0 Standardization
4.0Data Optimized
Outsource based,
just started.
Vendor locked-in.
In-house
development,
differentiate with
lower costs and
faster delivery.
Standardized
system
architecture, both
for hardware and
software.
Overwhelming
differentiation,
with enabling
architecture for
customer centric
service.
Achieved Next
Current Standard
Architecture