Post on 20-Jun-2015
description
© 2013 IBM Corporation
26 Sept 2013
CON 7370: Java Interprocess Communication Challenges in Low-Latency Deployments
© 2013 IBM Corporation
Important Disclaimers
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.
WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION
CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED.
ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED
ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE OR
INFRASTRUCTURE DIFFERENCES.
ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE.
IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT PRODUCT
PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE.
IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE
USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:
- CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIR
SUPPLIERS AND/OR LICENSORS
2
© 2013 IBM Corporation
Introduction to the speakers
Daryl Maier
– 13 years experience developing and deploying Java SDKs at IBM Canada Lab
– Recent work focus:
• X86 Java just-in-time compiler development and performance
• Java benchmarking
– Contact: maier@ca.ibm.com
Anil Kumar
– 11 years experience in server Java performance ensuring best customer experience
on all Intel Architecture based platforms
– Contact: anil.kumar@intel.com
3
© 2013 IBM Corporation
Why is IPC necessary?
IPC == intra- and extra-process communication between entities
Real applications are often partitioned into independent modules
Clean design, logical separation of entities
Improved QA
Component re-use
Communication with legacy or non-proprietary systems (web services)
But entities need to communicate
Communication latency
Bandwidth
Complex distributed scheduling and deadlocks
© 2013 IBM Corporation
What This Talk Is About
Help developers understand and solve IPC issues in the context of an application with low
response time requirements (millisecond)
Share our experiences with developing an industry standard Java benchmark
– Software design
– Hardware deployment
– How does it apply to your application or environment?
© 2013 IBM Corporation
IPC Requirements (Architecture)
How is your application structured?
– Loosely or tightly coupled?
– Java only?
Will simple signals do?
Are response/replies required?
Do you have special transport requirements? (e.g., encryption)
Is response time important?
Java SE only?
© 2013 IBM Corporation
Java Building Blocks for IPC (Implementation)
java.net
java.util.concurrent
NIO/NIO2 packages
CORBA
RMI
© 2013 IBM Corporation
Key Challenges With IPC
Maintain high throughput and low latency (eliminate bottlenecks)
Scalability of overall application
Avoid deadlocks and starvation
Work around limitations of communication technology
All are even more challenging in low SLA environments!
© 2013 IBM Corporation
SPECjbb2013
Next generation Java business logic benchmark from SPEC
Business model is a supermarket supply chain: HQ, suppliers, supermarkets
Scalable, self-injecting workload with multiple supported configurations:
– Composite (standalone)
– Multi-VM (several JVMs on the same host)
– Distributed (several JVMs on multiple hosts)
Throughput ops/sec metric and a response-time metric
Customer-relevant technologies: security, XML, JDK 7, etc.
Designed to share data between entities
© 2013 IBM Corporation
Response Time / Throughput Curve
Increasing load
© 2013 IBM Corporation
SPECjbb2013 General Architecture
Controller Agent
Driver
Agent
SP
SM HQ
InventoryCompany
DB
Config,Results
Controller TxInjector Backend (SUT)
Control Entity- benchmark infrastructure
Business Entity- workload
Storage- shared state
Control traffic- start/stop requests- audit requests
Business Data Traffic- business logic requests- fine-grained status
© 2013 IBM Corporation
SPECjbb2013 Communication Requirements
Separation of workload components
– Intra- and inter-JVM traffic
Control, status, data, and notifications traffic
– Request/Response and one-way Messages
Multiple benchmark deployments
– Scalable architecture
Millisecond response times under high transactional rate
Simple developer API
– Uniform API for communication between agents: intra- or inter-process
© 2013 IBM Corporation
SPECjbb2013 Communications Architecture
Driver
TxInjector
JVM1
ICConn
Clients
Backend
ICConn
Server
SM
JVM2
SM SP HQ
Active Entity- SM/HQ/SP- Driver
Transport- Encapsulation- Compression- Encryption
Intra-JVM data trafficBenchmark state:- Business-logic requests- Fine-grained status
Inter-JVM data trafficBenchmark state:- Business-logic requests- Fine-grained status
© 2013 IBM Corporation
Interconnects
Interconnect is the central communication fabric– Each JVM has its own IC
Registry handles routing– Global namespace for all clients– Local locations added when connected– Remote locations added lazily as they are resolved
Transports define the format of data transferred between clients
Connectivity defines the exchange format between two ICs
JVM
ICConnectivityLocal Registry
Master Registry
Client 1 Client 2 Client 3
© 2013 IBM Corporation
Transports
Transports model business data marshallers / unmarshallers
Transport mappings are fixed– Most frequent transport is PLAIN, i.e., “pass by reference”
Transport overheads can be LARGE, consider allowing adjustment in your application
Plain
Serial XML
none GZIP
none JCE
Encapsulation
Compression
Encryption
© 2013 IBM Corporation
Connectivities
Connectivity is the client-server pair delivering data between ICs
Pluggable providers: Grizzly/NIO, Grizzly/HTTP, Jetty/HTTP
Consider pooling outbound client connections to remote servers
Connectivities are complex to design, implement, and test. Consider a pluggable solution for production environment.
Server Tier 0
Server Tier 1
Server Tier n
Clients
IC
Inbound
Outbound
Server
© 2013 IBM Corporation
Connection Clients
Clients (or entities) that can connect to an IC
– Agents (communicate with Controller)
– BatchExecutors (process communication in batches in parallel)
Communicate with IC via uplinks and downlinks
Simple communications API
Connection Client
Downlink forin-bound traffic
Uplink forout-bound traffic
© 2013 IBM Corporation
Messages and Requests
Communication delivery:
– messages (no reply expected, non-blocking)
– requests (reply expected, block for response)
Repeated, identical messages/requests to same destination can be marshalled once and
cached by uplink to improve throughput
– Important for scalability
Batching multiple messages/requests to same destination in same packet
improved throughput and overhead (payload size and marshalling costs)
need careful consideration of payload size and flush mechanism to not harm SLAs
© 2013 IBM Corporation
Communication Design FAQ
Why did you choose a message passing scheme?
– Fits the business model
– Allows flexible configurations without changing code
Why didn’t you use the Java Messaging Service (JMS)?
– Java SE not Java EE
– Not a JMS benchmark!
Why did you re-implement a communication scheme rather than re-use an existing one?
– Wanted tight control over performance and bottlenecks
– Re-use complex components, carefully choosing implementations to meet flexibility
requirements
© 2013 IBM Corporation
Worker Thread Pools
WTPs execute transactions initiated by
incoming requests and other transactions
ForkJoin implementation for efficient batch
decomposition and work stealing
Each connection client within a JVM shares
a single WTP with the other connection
clients
– Reduces artificial context switches
Be careful with FJ and threads with
blocking I/O
Proper design of WTPs essential for
scalability and throughput
ICConn
Server
SM
JVM
SP HQ
Tier 0
Tier 1
Tier n
Worker Thread Pool (WTP)
© 2013 IBM Corporation
Deadlocks
WTP 1
Client
Server
IC1
Request 1Server
Client
IC2
WTP 2
Request 2
Potential deadlock if Request 2 is queued and all threads in WTP 1 are waiting for results from remote WTP 2.
JVM 1 JVM 2
Solutions?– Complicated distributed deadlock avoidance strategies– Rework transactions to eliminate cycles
© 2013 IBM Corporation
Communication Tiers
Introduce tiers to connectivity servers and thread pools
Architected to guarantee progress by ensuring there is always an available thread
Must be sized according to communication patterns
Communication requests:
– within the same IC are sent on the same tier as requestor thread
– to a remote IC are sent to the next highest tier
– for infrastructure/control are sent on a reserved tier
Destination tiers are transparent to sender
© 2013 IBM Corporation
Communication With Tiers
WTP 1
Client
Server
IC1
Request 1from Tier1
Server
Client
IC2
WTP 2
Request 2from Tier 2
JVM 1 JVM 2
Request 1arrives on Tier1
Request 1routed to Tier 2
WTP
Request 2arrives on Tier2
Request 2Routed to Tier 3 WTP
© 2013 IBM Corporation
Debugging Advice
Keep the communication API simple
Avoid aggregating timeout failures
– a timeout does not often reveal the underlying cause
Using I/O to debug may introduce latency that produces an artificial problem
Stress testing on large systems is critical for evaluating performance and overall design
© 2013 IBM Corporation
Factors That Affect Response Time
© 2013 IBM Corporation
Deployment Decisions
HW configuration:
– Systems with multiple network cards, network latency could be improved by affinitizing
network traffic to particular network cards and directing them to specific processors
– System could be connected via infiniBand or fiber optic instead of regular Ethernet cable
Native OS:
– Within Single OS image: could take advantage of loop back traffic optimizations
– Across Multiple OS image: careful about traffic routing
Virtual OS images:
– Good to affinitize VM to cores instead of free floating
– Be careful about IO across virtual OS images
Process level
– Process affinitized to a socket delivers more consistent response time
© 2013 IBM Corporation
Tunings: GC
If long latencies are tracking with GC pauses, tune your garbage collector
– Run with verbose GC
– GC policies, Heap size, Nursery size
Match with GC log time stamps
© 2013 IBM Corporation
Tunings: Time outs
Ensure communication time outs are appropriately sized
– Too large may not react quickly enough, affecting response time
– Too small does not tolerate normal workload delays
Time outs must scale when moved to larger deployments
Architect an appropriate response when something times out
– Retry mechanism?
© 2013 IBM Corporation
Tuning: Communications
Number of connection pools : Number of sockets
Number of grizzly pools : Number of threads in each Grizzly pool
Number of ForkJoin workers:
– Mostly need to set > logical processors
© 2013 IBM Corporation
Tuning: Timing APIs
Watch for too much contention on timing APIs when HPET is being used
30
© 2013 IBM Corporation
Top 10 Takeaways
10 Reduce communication as much as possible
9 Send big chunks of data
8 Combine messages in batches
7 If you care about throughput, use an asynchronous model
6 If you care about strict correctness, use a synchronous model
5 Re-use connection channels: do not create new ones for each message
4 Check errors: it is difficult to debug IPC, so log any error as much as possible
3 Use frameworks when appropriate
2 Profile and tune
1 Don’t believe it when someone says “It will never happen”
© 2013 IBM Corporation
Questions?
http://ibm.co/JavaOne2013
Visit the IBM booth and meet other IBM developers at JavaOne 2013
© 2013 IBM Corporation
References and Credits
Some benchmark slides and diagrams were adapted from SPECjbb2013 Design Committee
documentation and correspondence.
Aleksey Shipilev (Oracle) for communication architecture diagrams
© 2013 IBM Corporation
Trademarks
SPECjbb®2013 is a registered trademark of the Standard Performance Evaluation
Corporation (SPEC).
Java and all Java-based trademarks and logos are trademarks or registered trademarks of
Oracle and/or its affiliates.