JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

34
© 2013 IBM Corporation 26 Sept 2013 CON 7370: Java Interprocess Communication Challenges in Low-Latency Deployments

description

JavaOne 2013 presentation for CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments.

Transcript of JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

Page 1: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

26 Sept 2013

CON 7370: Java Interprocess Communication Challenges in Low-Latency Deployments

Page 2: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Important Disclaimers

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.

WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION

CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR

IMPLIED.

ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED

ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE OR

INFRASTRUCTURE DIFFERENCES.

ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE.

IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT PRODUCT

PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE.

IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE

USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

- CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIR

SUPPLIERS AND/OR LICENSORS

2

Page 3: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Introduction to the speakers

Daryl Maier

– 13 years experience developing and deploying Java SDKs at IBM Canada Lab

– Recent work focus:

• X86 Java just-in-time compiler development and performance

• Java benchmarking

– Contact: [email protected]

Anil Kumar

– 11 years experience in server Java performance ensuring best customer experience

on all Intel Architecture based platforms

– Contact: [email protected]

3

Page 4: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Why is IPC necessary?

IPC == intra- and extra-process communication between entities

Real applications are often partitioned into independent modules

Clean design, logical separation of entities

Improved QA

Component re-use

Communication with legacy or non-proprietary systems (web services)

But entities need to communicate

Communication latency

Bandwidth

Complex distributed scheduling and deadlocks

Page 5: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

What This Talk Is About

Help developers understand and solve IPC issues in the context of an application with low

response time requirements (millisecond)

Share our experiences with developing an industry standard Java benchmark

– Software design

– Hardware deployment

– How does it apply to your application or environment?

Page 6: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

IPC Requirements (Architecture)

How is your application structured?

– Loosely or tightly coupled?

– Java only?

Will simple signals do?

Are response/replies required?

Do you have special transport requirements? (e.g., encryption)

Is response time important?

Java SE only?

Page 7: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Java Building Blocks for IPC (Implementation)

java.net

java.util.concurrent

NIO/NIO2 packages

CORBA

RMI

Page 8: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Key Challenges With IPC

Maintain high throughput and low latency (eliminate bottlenecks)

Scalability of overall application

Avoid deadlocks and starvation

Work around limitations of communication technology

All are even more challenging in low SLA environments!

Page 9: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

SPECjbb2013

Next generation Java business logic benchmark from SPEC

Business model is a supermarket supply chain: HQ, suppliers, supermarkets

Scalable, self-injecting workload with multiple supported configurations:

– Composite (standalone)

– Multi-VM (several JVMs on the same host)

– Distributed (several JVMs on multiple hosts)

Throughput ops/sec metric and a response-time metric

Customer-relevant technologies: security, XML, JDK 7, etc.

Designed to share data between entities

Page 10: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Response Time / Throughput Curve

Increasing load

Page 11: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

SPECjbb2013 General Architecture

Controller Agent

Driver

Agent

SP

SM HQ

InventoryCompany

DB

Config,Results

Controller TxInjector Backend (SUT)

Control Entity- benchmark infrastructure

Business Entity- workload

Storage- shared state

Control traffic- start/stop requests- audit requests

Business Data Traffic- business logic requests- fine-grained status

Page 12: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

SPECjbb2013 Communication Requirements

Separation of workload components

– Intra- and inter-JVM traffic

Control, status, data, and notifications traffic

– Request/Response and one-way Messages

Multiple benchmark deployments

– Scalable architecture

Millisecond response times under high transactional rate

Simple developer API

– Uniform API for communication between agents: intra- or inter-process

Page 13: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

SPECjbb2013 Communications Architecture

Driver

TxInjector

JVM1

ICConn

Clients

Backend

ICConn

Server

SM

JVM2

SM SP HQ

Active Entity- SM/HQ/SP- Driver

Transport- Encapsulation- Compression- Encryption

Intra-JVM data trafficBenchmark state:- Business-logic requests- Fine-grained status

Inter-JVM data trafficBenchmark state:- Business-logic requests- Fine-grained status

Page 14: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Interconnects

Interconnect is the central communication fabric– Each JVM has its own IC

Registry handles routing– Global namespace for all clients– Local locations added when connected– Remote locations added lazily as they are resolved

Transports define the format of data transferred between clients

Connectivity defines the exchange format between two ICs

JVM

ICConnectivityLocal Registry

Master Registry

Client 1 Client 2 Client 3

Page 15: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Transports

Transports model business data marshallers / unmarshallers

Transport mappings are fixed– Most frequent transport is PLAIN, i.e., “pass by reference”

Transport overheads can be LARGE, consider allowing adjustment in your application

Plain

Serial XML

none GZIP

none JCE

Encapsulation

Compression

Encryption

Page 16: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Connectivities

Connectivity is the client-server pair delivering data between ICs

Pluggable providers: Grizzly/NIO, Grizzly/HTTP, Jetty/HTTP

Consider pooling outbound client connections to remote servers

Connectivities are complex to design, implement, and test. Consider a pluggable solution for production environment.

Server Tier 0

Server Tier 1

Server Tier n

Clients

IC

Inbound

Outbound

Server

Page 17: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Connection Clients

Clients (or entities) that can connect to an IC

– Agents (communicate with Controller)

– BatchExecutors (process communication in batches in parallel)

Communicate with IC via uplinks and downlinks

Simple communications API

Connection Client

Downlink forin-bound traffic

Uplink forout-bound traffic

Page 18: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Messages and Requests

Communication delivery:

– messages (no reply expected, non-blocking)

– requests (reply expected, block for response)

Repeated, identical messages/requests to same destination can be marshalled once and

cached by uplink to improve throughput

– Important for scalability

Batching multiple messages/requests to same destination in same packet

improved throughput and overhead (payload size and marshalling costs)

need careful consideration of payload size and flush mechanism to not harm SLAs

Page 19: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Communication Design FAQ

Why did you choose a message passing scheme?

– Fits the business model

– Allows flexible configurations without changing code

Why didn’t you use the Java Messaging Service (JMS)?

– Java SE not Java EE

– Not a JMS benchmark!

Why did you re-implement a communication scheme rather than re-use an existing one?

– Wanted tight control over performance and bottlenecks

– Re-use complex components, carefully choosing implementations to meet flexibility

requirements

Page 20: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Worker Thread Pools

WTPs execute transactions initiated by

incoming requests and other transactions

ForkJoin implementation for efficient batch

decomposition and work stealing

Each connection client within a JVM shares

a single WTP with the other connection

clients

– Reduces artificial context switches

Be careful with FJ and threads with

blocking I/O

Proper design of WTPs essential for

scalability and throughput

ICConn

Server

SM

JVM

SP HQ

Tier 0

Tier 1

Tier n

Worker Thread Pool (WTP)

Page 21: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Deadlocks

WTP 1

Client

Server

IC1

Request 1Server

Client

IC2

WTP 2

Request 2

Potential deadlock if Request 2 is queued and all threads in WTP 1 are waiting for results from remote WTP 2.

JVM 1 JVM 2

Solutions?– Complicated distributed deadlock avoidance strategies– Rework transactions to eliminate cycles

Page 22: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Communication Tiers

Introduce tiers to connectivity servers and thread pools

Architected to guarantee progress by ensuring there is always an available thread

Must be sized according to communication patterns

Communication requests:

– within the same IC are sent on the same tier as requestor thread

– to a remote IC are sent to the next highest tier

– for infrastructure/control are sent on a reserved tier

Destination tiers are transparent to sender

Page 23: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Communication With Tiers

WTP 1

Client

Server

IC1

Request 1from Tier1

Server

Client

IC2

WTP 2

Request 2from Tier 2

JVM 1 JVM 2

Request 1arrives on Tier1

Request 1routed to Tier 2

WTP

Request 2arrives on Tier2

Request 2Routed to Tier 3 WTP

Page 24: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Debugging Advice

Keep the communication API simple

Avoid aggregating timeout failures

– a timeout does not often reveal the underlying cause

Using I/O to debug may introduce latency that produces an artificial problem

Stress testing on large systems is critical for evaluating performance and overall design

Page 25: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Factors That Affect Response Time

Page 26: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Deployment Decisions

HW configuration:

– Systems with multiple network cards, network latency could be improved by affinitizing

network traffic to particular network cards and directing them to specific processors

– System could be connected via infiniBand or fiber optic instead of regular Ethernet cable

Native OS:

– Within Single OS image: could take advantage of loop back traffic optimizations

– Across Multiple OS image: careful about traffic routing

Virtual OS images:

– Good to affinitize VM to cores instead of free floating

– Be careful about IO across virtual OS images

Process level

– Process affinitized to a socket delivers more consistent response time

Page 27: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Tunings: GC

If long latencies are tracking with GC pauses, tune your garbage collector

– Run with verbose GC

– GC policies, Heap size, Nursery size

Match with GC log time stamps

Page 28: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Tunings: Time outs

Ensure communication time outs are appropriately sized

– Too large may not react quickly enough, affecting response time

– Too small does not tolerate normal workload delays

Time outs must scale when moved to larger deployments

Architect an appropriate response when something times out

– Retry mechanism?

Page 29: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Tuning: Communications

Number of connection pools : Number of sockets

Number of grizzly pools : Number of threads in each Grizzly pool

Number of ForkJoin workers:

– Mostly need to set > logical processors

Page 30: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Tuning: Timing APIs

Watch for too much contention on timing APIs when HPET is being used

30

Page 31: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Top 10 Takeaways

10 Reduce communication as much as possible

9 Send big chunks of data

8 Combine messages in batches

7 If you care about throughput, use an asynchronous model

6 If you care about strict correctness, use a synchronous model

5 Re-use connection channels: do not create new ones for each message

4 Check errors: it is difficult to debug IPC, so log any error as much as possible

3 Use frameworks when appropriate

2 Profile and tune

1 Don’t believe it when someone says “It will never happen”

Page 32: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Questions?

http://ibm.co/JavaOne2013

Visit the IBM booth and meet other IBM developers at JavaOne 2013

Page 33: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

References and Credits

Some benchmark slides and diagrams were adapted from SPECjbb2013 Design Committee

documentation and correspondence.

Aleksey Shipilev (Oracle) for communication architecture diagrams

Page 34: JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Latency Deployments

© 2013 IBM Corporation

Trademarks

SPECjbb®2013 is a registered trademark of the Standard Performance Evaluation

Corporation (SPEC).

Java and all Java-based trademarks and logos are trademarks or registered trademarks of

Oracle and/or its affiliates.