Grids@Work V Oracle Coherence for Finance Applications Ewan Slater Senior Solution Specialist EMEA...
-
Upload
logan-horton -
Category
Documents
-
view
215 -
download
1
Transcript of Grids@Work V Oracle Coherence for Finance Applications Ewan Slater Senior Solution Specialist EMEA...
<Insert Picture Here>
Grids@Work VOracle Coherence for Finance ApplicationsEwan SlaterSenior Solution SpecialistEMEA Technology Fusion Middleware
Topics
• Scalability – why do we care?• Scalability – what’s the problem?• Traditional approaches and their drawbacks• The Coherence approach• What is Coherence?• Where does Coherence fit?• How Coherence works• Using Coherence• Coherence in Action• Conclusion• Q & A
Scalability – why do we care?
IT Initiatives Driving Scalability Demand
• XTP• Highest volume, Low Latency,
Absolute Transactional Integrity
• Virtualization• Increased demand on Data Sources • Application re-provisioning must occur transparently
without interruption of data access• Must handle multiple load increases at the same time
• SOA • Increasing common access to resources• Sharing access means continuous availability and absolute reliability
• EDA• Event driving transactions causing massive increase in load• Pervasiveness driving data need across all systems affected
Demand
Supply
Resources
Time
Compute Power: SMP/Multicore
Memory Arrives: “In Memory Option”
Network Speed: Gbe/10G/IB
Storage: Flexibility
Hardware Capacity ImpactHardware Capacity Impact
The more people have, the more they want!
Availability – Continuous
Reliability – Transactional Integrity
Scalability – Capacity on Demand
Performance – Zero Latency
Enterprise Infrastructure RequirementsEnterprise Infrastructure Requirements
Grid Automation
Service Level Management
Application Performance Mgmt
Provisioning
Enterprise Manageability RequirementsEnterprise Manageability Requirements
Service Oriented Architecture
Web 2.0
Event Driven Architecture
Extreme Transaction Volumes
Software Framework PressuresSoftware Framework Pressures
Scalability – what’s the problem?
In general, applications don’t scale well…
…what worked fine in development, or for 50 users…
…can’t cope with production demand…
…that increases over time…
Why don’t applications scale?
• Single points of failure (SPOF)• Database failure or pause = application failure or pause• One server fails, the entire system fails• One application or JVM fails, the application fails
• Single points of bottleneck (SPOB)• Shared resources• The “hub” of Hub-and-spoke architectures• Heavy database or disk I/O
• Applications are not designed to scale• It works in single-user testing on a PC, but it will work in
production?• Scaling is often an afterthought – “it’s the DBA’s problem”
Scaling the Application Tier:Traditional Approaches
Scale up (or even bigger boxes)
Approach How Advantages Disadvantages
Scale-Up
“It’s an infrastructure problem”
Buy Big Boxes
Increase Resources (cpu, memory, hdd capacity, speed and network, etc)
By specialized hardware (Azul, Infiniband…)
Simple (overnight) No development No impact on internal design
Expensive
Will hit physical limits
Will have to redesign at limit
Non-graceful deterioration at limit
Stop, Add, Restart required to scale
Bigger box = Much Bigger price tag!!!
• High incremental cost• Wasted capacity
At some point, even the biggest box has it’s limits!
Stateless application tier(or blame the DBA)
Approach How Advantages Disadvantages
Stateless Scale-Out
“Push state scale-out into lower Data Source layer”
“It’s the DBA’s problem”
Make application stateless (eg: stateless sessions)
Use lots of stateless servers
Use load-balancing
Use “big” and “scalable” Data Source to ensure application state scale-out
Easy to develop (not overnight, but relatively simple as no state is managed)
Scale-out is easy, just add more servers
Only scales to match underlying Data Source performance
When underlying limit is reached, have to redesign
Network bottlenecks experienced as data is moved between layers
Performance Bottleneck Between Tiers
A A HUGEHUGE performance bottleneck:
Volume / Complexity / Frequency of Data Access
Application Database
Object
Java SQL
Relational
Performance Bottleneck Between Tiers
Solution:
Move relevant data to middle tier Application Server
Memory Cache
ObjectRelational Database
Java
Application
• One Solution is to keep the object data in object form in high-speed distributed memory cache
• Database remains the system of record (persistence)
Application Server
Memory Cache
Object
Application Server
Memory Cache
Object
Caching in the application
Approach How Advantages Disadvantages
Caching
“Keep recent copies of state”
“We’ll save the DB and DBA by caching”
Application keeps local copies (in memory or on local disk) of recently / commonly used state
Seems simple
Reduces Data Source and Network load
Significant application performance improvements
Maintaining consistency of data between Local and Data Source instances can be difficult
Require “messaging infrastructure” to ensure consistency across a cluster (and application development)
Typically applicable to “read only” applications and not “write a lot” applications
Easy to get wrong
Local Caching
Can be scaled out…Farm Caching
Inconsistent Local Cache
Farm Caching
• Benefits:• Same as Local Cache• May now scale out
• Constraints:• Same as Local Cache - but now worse - across Farm!• Singularity broken between members (Incoherent)• Members have own copies of Entries• No cost savings in making copies to members• Cache capacity doesn’t increase with Farm size
Scale out the Container(or blame the App Server)
Approach How Advantages Disadvantages
Use an Application Container
“Our magical clustered container will scale our application infinitely”
Believe the vendors & the marketing
Follow a “scalability paradigm”
Use a “Clustering Container”
… It scaled the “Pet Store” linearly, therefore our X application will also scale linearly (where X ≠ “Pet Store)
Simple
Well documented and communicable paradigm
Easily scale development team
Typically scales in-the-small
Usually relies on “scale-up” rather than “scale-out”
Requires specialized skills or products (out side of the standard paradigm) to really scale
Clustering is primarily about High-Availability, not Scalability!
Traditional Scale-Out Approaches…
#1. Avoid the challenge of maintaining consensus• Opt for the “single point of knowledge”
#2. Have crude consensus mechanisms, that typically fail and result in data integrity issues (including loss)
Client + Server Model(Hub + Spoke)
Master + Worker Model(Grid Agents)
Active + Passive(High Availability)
Traditional Scale-Out Consequences…
• Have unbalanced / unfair load and task management• Some servers have greater system responsibility than others
• Have Single Points of Bottleneck (SPoB)• Have Single Points of Failure (SPoF)
• “Micro outages” are magnified as you scale-out
• Exhibit Strong Coupling to Physical Resources• Software completely dependent on individual physical servers
• Require specialized deployment and operation for individual Resources
• Some servers require “special attention” to operate
The Coherence Approach
So how does Coherence solve the problem?
Consensus is the key…
Imagine a team where some members…
• Have a different impression of the actual members of the team
• Allocate tasks and information to their members (from their perspective) but on behalf of the team
• Result?• Inconsistent views of team information• Without consensus some information will be inconsistent (at
best) or be unavailable or lost (at worst / common)
Real Madrid before Capello
Membership Consensus
• Consensus between resources is fundamental to ensure integrity of information (and work) when scaling-out
Real Madrid after Capello
Coherence relies on Consensus
• Traditional scale-out approaches limit• Scalability, Availability, Reliability and Performance
• In Coherence…• Servers share responsibilities (health, services, data…)• No SPoB • No SPoF• Massively scalable by design
• Logically servers form a “mesh”• No Masters / Slaves etc.• Members work together as a team
The result?
Oracle Coherence:In Memory Data Grid
What is Coherence?
(c) Copyright 2007. Oracle Corporation
Oracle Coherence…
• Is an enabling technology that…• Allows customers to build bullet proof
applications…• And achieve high performance and predictable
scalability
Typical Coherence Customers
• Online gaming (e.g. trading system)• Telcos (e.g. SMS backbone)• Hospitality (e.g. flight reservation system)• Insurance (e.g. user profile management)• Financial Services (e.g. risk engine)• Public sector (e.g. railway signalling)
Common theme:Mission – critical, bullet – proof solutions
• Reliability• Availability• Scalability• Performance
Coherence doesn’t need an app server
There is a .NET client library…and this is pure .NET
…and…
There is a C++ client library…and this is pure C++
Where does Coherence fit?
Look at the shape of the data
Application Layers
• Web Server
• App Server
• DB Server
Network
Data “Shape” across tiers
WebCache
Web Servers
Application Servers
Coherence
RAC
Times Ten
HTML Data Structures in Memory
Java Data Structures in Memory
SQL Data Structures in Memory
Web Cache offloads Web Servers,
Improves Network Performance via
Compression
Coherence caches Java Structures in Memory; Very Fast
Access to Java Data in Memory across Mid-
Tier Grid
Times Ten & RAC provide Scalability to
Database Data improving Query & Transaction Write
Performance
Web Tier Application Tier Database Tier
What is Coherence not?
• Plug and play - the application code will need to change.
• A database – persistent data will need to be written to a database (Oracle RAC is often an ideal fit).
• A Transaction Processing Monitor.• A panacea for:
• Inadequate hardware• Badly written applications• Poor database design
How Coherence Works
(c) Copyright 2007. Oracle Corporation
Coherence Works by Consensus
• Consensus is key• Communication is more efficient (peer-to-peer)• No outages for voting (no need – everyone is a peer)• No SPoF, SPoB• No need for broadcast traffic (yelling at each other)• You can do many things once you have “consensus”.
made possible by TCMP
(the “secret sauce”)
Tangosol Cluster Management Protocol (TCMP)
• Coherence’s own protocol between cluster members• TCMP utilizes UDP• Massively scalable
• Asynchronous• Point-to-point
• UDP Multicast is used for:• New JVMs to join the cluster automatically• Maintaining cluster membership• Multicast is not required; it may be disabled with Well Known Addresses
(WKA)
• UDP Unicast is used for most communication• Very fast and scalable• TCMP guarantees packet order and delivery• TCP/IP connections do not need to be maintained
Distributed caching for your data…
…and go faster stripes for your data
Hardware implications(Blades not Bludgeons)
Big Iron
• Buy based on predicted growth• High incremental cost
Low cost clusters
• Buy as you grow• Small increments at present day
prices & clock speeds
Using Coherence
Building an Application
• Developers use Coherence API to• Access Data• Listen for Events• Query Data• Process Data in the Grid
Setting up a grid
• Coherence clusters to form a grid OOTB• A grid may contain many caches• A cache structure is defined by a scheme• Schemes are defined in config files
Distributed Data Management (access)
(c) Copyright 2007. Oracle Corporation
The Distributed Scheme
(one of many)
In-Process DataManagement
Distributed Data Management (update)
(c) Copyright 2007. Oracle Corporation
Distributed Data Management (failover)
(c) Copyright 2007. Oracle Corporation
Distributed Data Management
• Members have logical access to all Entries• At most 2 network operations for Access• At most 4 network operations for Update• Regardless of Cluster Size• Deterministic access and update behaviour
(performance can be improved with local caching)
• Predictable Scalability• Cache Capacity Increases with Cluster Size• Coherence Load-Balances Partitions across Cluster• Point-to-Point Communication (peer to peer)• No multicast required (sometimes not allowed)
(c) Copyright 2007. Oracle Corporation
Data Distribution: Clients and Servers
(c) Copyright 2007. Oracle Corporation
“Clients” with storage disabled
“Servers” with storage enabled
Near Caching (L1 + L2) Topology
(c) Copyright 2007. Oracle Corporation
Observing Data Changes
(c) Copyright 2007. Oracle Corporation
Parallel Queries
(c) Copyright 2007. Oracle Corporation
Parallel Processing and Aggregation
(c) Copyright 2007. Oracle Corporation
Data Source Integration (read-through)
(c) Copyright 2007. Oracle Corporation
Data Source Integration (write-through)
(c) Copyright 2007. Oracle Corporation
Data Source Integration (write-behind)
(c) Copyright 2007. Oracle Corporation
Coherence*Extend
WAN Topology
Oracle Coherence in Action
Example Use Cases
• Mainframe Cost Reduction• Caching repeated queries
• Oracle Coherence with Compute Grid• Intra – day risk calculation
• Oracle Coherence Cloud• Message – based infrastructure replacement
• Eliminating SPoB• Trading Exchange Redevelopment
Mainframe Cost Reduction
Taming the MIP Monster
• Retail banking IT provider• Supports 400+ banks• 4 key systems – repeated queries to mainframe• 100,000 queries to mainframe each day• Large recurring cost to the business
• Coherence deployed as distributed cache• 100,000 queries 1600 queries• Saving ~€1000000 in 1st year
Oracle Coherence with Compute Grid
Compute Grid on Database
Traditional Compute GridTraditional Compute Grid
Grid Manager
Grid Tasks
• Emphasis on orchestrating tasks out to compute nodes in grid
•Data Set either loaded locally or pulled off of back end data source
•Applications Highly Customized for Grid Environment
Grid Applications
Great processing scalability with inevitable data bottlenecking
Orchestration can be point of bottleneck as well
Compute Grid on Data Grid
Oracle Coherence
Oracle RAC
Traditional Compute Grid with Data Scale OutTraditional Compute Grid with Data Scale Out
High Performance Computing (HPC)High Performance Computing (HPC)
Grid Manager
Grid Tasks
Grid Applications
•Oracle Coherence Data Grid Overlay onto Compute Grid
• Compute Grid Scale Out with Data Fault Tolerance
• Massive Persistent Scale Out with Oracle RAC
Customer Story: WachoviaScenario• Wachovia Investment Bank introducing “Service Oriented Infrastructure (SOI)”
• Requires absolute data availability for complex Grid Computations
Problem• Existing Compute Grid infrastructure suffering from data latency and throughput
problems
• Complex calculations so lengthy as to be outdated
Solution• Data Grid overlay on Compute Grid
• Enable risk calculations to fully utilized the grid hardware by having real time access to in-memory data as well as parallelization .
• Reduced critical risk computation from 50 days to under 1 hour!
Over 300 CPUs in Production!
Oracle Coherence Cloud
The challenge:Scale this...
• Domain: Retail Banking Infrastructure• Over 500 Banks• 100,000+ Teller Staff Desktops Applications• 10,000+ Cash Machines (ATMs) • 10,000,000’s of Internet Banking Transactions/day
• Current Infrastructure• Java SE based (no J2EE – apart from Servlets)• Oracle RAC (not an issue – scaling across a WAN )• Messaging (serious challenges)• Processing Business Tasks (challenges approaching)• 30,000,000+ Business Tasks a day – minimum.
• must do 100,000,000 effortlessly per/day before going live
(c) Copyright 2007. Oracle Corporation
The challenge continued:Scale this...
• Execution of Business Tasks• Account Balance, Credit/Debit, Funds Transfer, Statement
Processing, Batch Processing, Payment Processing• Tasks arrive from a variety of clients (thin, rich, cross-
platform, mainframes...) – variety of languages
• Goal:• Tasks are executed by the “cloud”• Don’t want to build own “cloud” software
• Their knowledege:• Massive experience in scale-out. Could build it themselves,
but budget (time/resources/money) will be saved by buying.
(c) Copyright 2007. Oracle Corporation
The Cloud
Architectural issue:Performance Bottleneck Between Tiers
A A HUGEHUGE performance bottleneck:
Volume / Complexity / Frequency of Data Access
Application Database
Object
Java SQL
Relational
(in some companies, this is would be time to blame the DBA)
Constraints...
• No Single Points of Failure• No Simple Points of Bottleneck• No Service Registries• No Masters + Workers
• already got one that is partitioned into over 200 separate clusters
• No Manual Partitioning• Keep everything in Memory• Active + Active Sites
• Across WAN
• Develop system on a note book• Scale to over 500 servers• No reconfiguration outages• No byte-code manipulation /
proxies
• No Data or Task Loss• During failure• During server upgrade• During scale out
• No Transactions (XA)• Support multiple versions• Predictable response times• Predictable scale out costs• Manage via JMX, from any point in
the “Cloud”.• Pure Java Standard Edition• Infrastructure add a maximum of
3ms latency to tasks.• Integrate with existing applications
(Java 1.4.2+)
(c) Copyright 2007. Oracle Corporation
Approach
(c) Copyright 2008. Oracle Corporation
• Business Tasks are regular Java objects (pojo)
• Place Business Tasks into Coherence • Coherence dynamically distributes Tasks across the Cluster• Tasks are resilient in the Cluster• May use “affinity” to ensure related Tasks processed together• Coherence triggers task processing
• Scaling out Coherence = Scaling out Task Processing
List of the Performed tests
Scalability Test
Guaranteed Delivery Test
Failover Test
Server Joining Test
Unattended Long Term Test
Results
(c) Copyright 2008. Oracle Corporation
• While submitting Tasks (regular system load)• Test 1: Scale from 1 server to over 400
• No reconfiguration• Test 2: Randomly kill servers
• No reconfiguration• Test 3: Kill 1, 2, 4, 8, 16, 32, 64, 128, 160 servers at once
• No data loss
• Possible 1,200,000,000 Tasks execution capacity per/day
• Client may reduce current hardware costs by 75%
Eliminating Single Point of Bottleneck
Trading Exchange
(c) Copyright 2008. Oracle Corporation
• Similar requirements and constraints• Order processing (Foreign Exchange)• 1,000’s per second (initial) per currency pair• No manual partitioning• No transactions• 10ms max latency for full accept, validate, match,
respond
• Achieved with Coherence using BMLs (< 3ms)• 14 weeks development (start to go live)
Previous Approach(failed to meet SLA’s)
(c) Copyright 2008. Oracle Corporation
Coherence – based Solution
(c) Copyright 2008. Oracle Corporation
Conclusion
Oracle Coherence…
• Is an in – memory object data grid, providing• Scalability• Availability• Reliability• Performance
• Supports many mission – critical apps especially in Financial Services
• Integrates with and supports other technologies:• Compute Grids• Database Grids• C++, .Net
• Is a key component of Oracle’s XTP platform
Q & A