Eugene Ciurana [email protected] - Amazon Web...
Transcript of Eugene Ciurana [email protected] - Amazon Web...
Letʼs move the Java world!
High-Availability,Fault Tolerance,
and Resource Oriented Computing
Eugene [email protected] - pr3d4t0r ##java, irc.freenode.net
This presentation is available from:
http://ciurana.eu/GeeCON-2010
Letʼs move the Java world!
About Eugene...
• 15+ years building mission-critical, high-availability systems
• 14+ years of Java work
• Open source evangelist
• Official adoption of open source/Linux at Walmart worldwide
• State of the art main line of business at the largest companies in the world - not a web guy!
Letʼs move the Java world!
What You’ll Learn...
• Decoupled, event-driven, resource-oriented systems are more flexible
• Avoid tight, point-to-point integration
• Enhance JVM-based apps with better domain-specific languages
• How to move away from monolithic app servers and architectures
• How to implement event-driven systems based by leveraging existing infrastructure and SOA investment
• Treat computational resources as addressable entities
• Balance open source vs. commercial products
Letʼs move the Java world!
Very Important!
Please Ask Questions!(don’t be shy)
Letʼs move the Java world!
What is Scalability?
• Scalability is the property of a system to:
• handle bigger amounts of work; or
• to be easily expanded in response to increased demand
• network, processing, database, file resources
• Types of scalability
• Horizontal (out): add more nodes with identical functionality as existing ones and redistribute the load
• Vertical (up): expand by adding more cores, main memory, storage, or network interfaces
Letʼs move the Java world!
Horizontal Scalability
Node Node Node Node
Load Balancer
Node Node Node
Load Balancer
Scales out
Clustering!
Letʼs move the Java world!
Vertical Scalability
Dual CoreDual Processor
32 MB RAM
Virtual Node 0
Virtual Node 3
Virtual Node 2
Virtual Node 1
Dual CoreSingle Processor
16 MB RAM
Virtual Node 0
Virtual Node 2
Virtual Node 1
Scales up
Letʼs move the Java world!
What is Availability?
• How well a system provides useful resources over a set period of time
• High availability guarantees an absolute degree of functional continuity within a time window
• Expressed as a relationship between uptime and unplanned downtime
• A = 100 - (100*D/U); D, U expressed in minutes
• Beware: uptime != available
Letʼs move the Java world!
The Nines GameAvailability % Downtime (minutes) Downtime/year Vendor jargon
90 52560.00 36.5 days one nine
99 5256.00 3.7 days two nines
99.9 526.60 8.8 hours three nines
99.99 52.56 53 minutes four nines
99.999 5.26 5.3 minutes five nines
99.9999 0.53 32 seconds six nines
Letʼs move the Java world!
Service Level Agreements• SLAs are negotiated terms that outline the obligations of the two
parties delivering and using a system
• System type - not all systems require the same SLA
• Levels of availability
• Minimum
• Target
• Uptime
• Network
• Power
• Maintenance windows
• Serviceability
• Performance and metrics
• Billing
SLAs helpdetermine if you scale up
or out
Letʼs move the Java world!
Load Balancers• They work by spreading requests among two or more resources
• Implemented in hardware or in software
• Multiple machines
• Multiple processes
• Multiple threads
• Resources appear as a single device to consumers
• Can be stateless (web services), or stateful (applications that require session management)
• Algorithms determine the distribution
• 1/n == all systems equally likely to service
• Special requests (e.g. music store) some servers get hit more than others
Letʼs move the Java world!
Load Balancers
Node192.168.202.55
Node192.168.202.66
Node192.168.202.67
Node192.168.202.69
Load Balancer74.0.125.28
RnR = requestn = sequence number
Consumer
R2R3R1
Letʼs move the Java world!
Persistent Load Balancers
Node192.168.202.55 Node
192.168.202.66Node
192.168.202.67
Node192.168.202.69
Sticky Load Balancer
74.0.125.28
Consumer ConsumerConsumer
Letʼs move the Java world!
Load Balancing and Databases
Node192.168.202.55
Node192.168.202.66
Node192.168.202.67
Node192.168.202.69
Load Balancer74.0.125.28
Consumer
SessionData
Letʼs move the Java world!
Caching Strategies
• Stateful load balancing requires data sharing
• Caching distributes popular, shared read-only data
• Think of them as a giant hash map
• If the data isn’t in the cache, fetch it from database
• Write policies:
• write-through: write to the cache AND database
• write-behind: cache is marked “dirty” and updated only if a dirty datum is requested
• no-write allocation: only read requests are cached; assumes data never changes
Letʼs move the Java world!
Caching Usage Pattern
• Application caching
• Little or no programmer participation (e.g. Terracotta)
• Explicit API calls (memcached, Coherence, etc.)
• Web caching - stores full documents, or fragments (‘particles’) on the server or client and are invisible to the client
• Web accelerators - distribute the load (e.g. CDN like S3, Akamai, etc.)
• Proxy caches - distribute requests to same resources and may provide filtering/query (e.g. Squid, Apache, ISA servers)
Letʼs move the Java world!
Caching Usage PatternBegin
Fetch datum from
cache
datum is None
Query datum from database
Add datum to cache
Use datum in app
End
yes
no
Query?
Update datum in database
Invalidate cache
Add or update datum to cache
query update
Letʼs move the Java world!
Distributed Caching
Load Balanced Configuration or Datagram
Node192.168.202.55
Node192.168.202.66
Node192.168.202.67
Node192.168.202.69
Load Balancer74.0.125.28
Consumer
Cache 0 Cache 1 Cache 2 Cache 3
Database
Letʼs move the Java world!
Clustering
• Cluster - two or more systems that appear to users as a single system
• A cluster (horizontally scalable) system is more cost-effective than a monolithic single system (vertically scalable) with the same performance characteristics
• Systems are connected in the cluster over high-speed LANs like Gb Ethernet, FDDI, Infiniband, Myrinet, etc.
Letʼs move the Java world!
A/A Clustering
• A/A == Active/Active
• Distribute the load evenly among multiple nodes
• All nodes offer the same capabilities
• All nodes are active at the same time
Node192.168.202.55
Node192.168.202.66
Node192.168.202.67
Node192.168.202.69
Load Balancer74.0.125.28
Consumer
Letʼs move the Java world!
High-Availability A/P Cluster
• A/P == Active/Passive
• Provides uninterrupted service through redundant nodes
• Eliminates single-point-of-failure
• Two nodes minimum, and “heartbeat” detection
• Automatic traffic switch for fail-over
Active Node192.168.202.55
Failover Node192.168.202.69
Router74.0.125.28
Consumer
State DataCache
FailoverDatabaseDatabase
heartbeat
replication or clustered database
Letʼs move the Java world!
Grid
• Process loads as independent jobs
• Nodes don’t require data sharing
• Storage, network may be shared by all nodes
• Intermediate results have no bearing on other jobs progress
• Each node is independent
• Map/Reduce (Hadoop)
Node Node Node Node
Load Balancer
Node Node Node Node
Load Balancer
Consumer
Master
Letʼs move the Java world!
Computational Cluster
• Used for operations that require raw computational power
• Not good for transactional operations (web, database)
• Tightly coupled nodes, homogeneous, close proximity
• Meant to replace supercomputers
Node Node Node Node
Node Node Node Node
Consumer
Master
Letʼs move the Java world!
Redundancy and Fault Tolerance
• Redundancy - the expectation that any system component failure is independent of failure in other components
• Fault tolerance - the system continues to operate in the event of component failure
• May have decreased throughput
Fault toleranceresults from
SLAs
Letʼs move the Java world!
Fault Tolerance SLA Requirements
• No single point of failure - redundant components ensure continuous operation
• Allow repairs without disruption of service
• Fault isolation - problem detection must pinpoint the specific faulty component
• Fault propagation containment - problems in one component must not cascade to others
• Reversion mode - the system can be set back to a known state on command
Letʼs move the Java world!
A/A Cluster Fault Tolerance
• Uninterruptible, scalable service (stateless, web services)
• Failure transparency - though maybe degraded service
• Ideal for event-based web services (SOAP, REST, JMS, etc.)
• No dependencies between nodes
Node192.168.202.55
Node192.168.202.66
Node192.168.202.67
Node192.168.202.69
Load Balancer74.0.125.28
Consumer
ReplacementNode
192.168.202.53
Letʼs move the Java world!
A/P Cluster Fault Tolerance
• High availability through redundancy and failure detection
• Higher cost - used for stateful systems
• May require active sys- or netadmin participation
• More moving parts - more things to coordinate
Node192.168.202.55
Failover Node192.168.202.69
Router74.0.125.28
Consumer
State DataCache
FailoverDatabaseDatabase
heartbeat
Letʼs move the Java world!
Putting It All Together
Letʼs move the Java world!
ROC Architecture
• ROC = Resource-Oriented Computing
• Everything is a resource (computational, data, other)
Internet
ServiceProvider
(UPS, FedEx)
Mule ESB
Single Sign-On
Active
Directory
Legacy
Auth
LDAP, SOAP Mainframe / RACF
CRMProduct
Catalogue
ProductSupportPages
ProductSupportPages
ProductSupportPages
HTTP, XMLJDBCSOAP
TCP pass-through
Remedy
Service Object
business logicWeb app
Web browser
GUIAppDedicated API
Transformer
Transformer
Transformer
JMS, SOAP, etc.
Letʼs move the Java world!
SOA and Computational Network
Letʼs move the Java world!
Real-Life Example - LeapFrogEnd-User System (Mac, Windows)
LeapFrog
Connect
Web
Browser
USB
Internet
S3
Content
RepositoryThird-party
Partner Site
www.leapfrog.comconnected
productsLearningPath
Firewall
Mule ESB backboneHTTP, SOAP (CXF), REST, etc. routing, filtering, and dispatching; ActiveMQ JMS broker; dedicated LeapFrog services
Mule ESB tailboneConnected products SOAP, REST web
services
Mule ESB funnyboneDevice log upload, processing, servlet
container
Content
Management
System
REST, JCR
Device
Logs
Crowd SSO
Customer
Data
Game
play
Data
Content
Authoring
User
Credentials
Servlets
App Logic
Letʼs move the Java world!
Real-Life Example - LeapFrog
Backbone - message filtering, routing, dispatching, queuing, events
Internet
Load Balancer
Application
Server
Tomcat 6
Services Proxy
Application
Server
Tomcat 6
Load Balancer - Backbone
Mule ESB
1.6.2
Mule ESB
1.6.2
Mule ESB
1.6.2
Mule ESB
1.6.2
Load Balancer - Tailbone
Mule ESB
SOAP, RESTMule ESB
SOAP, REST
Database
Load Balancer - Funnybone
Mule ESB
servlet, MTOMMule ESB
servlet, MTOM
NFS
share
Load Balancer - Message Broker
ActiveMQ ActiveMQ
NFS
share
Letʼs move the Java world!
Mule SOA Applied Clustering
Mule ESB as Application Container 1
Service 1 Service 2 Service 3
Mule ESB as Application Container 2
Service 1 Service 2 Service 3
Load Balancer
External Applications
http://server.mycompany.com/service_call
http://mule_server_2/service_callhttp://mule_server_1/service_call
* Two or more Mule instances can provide services, for scalability if there is high demand* Load balanced configuration has built-in fail-over* External apps see a single point of entry: the service endpoint name* Load balancer or proxy sends the request to any available Mule server* Increased demand - add another Mule server without interrupting the existing ones* Decreased demand - remove Mule servers without interrupting other servers* This is an active/active configuration - any server can handle a request at any time* Assumes that the service application components are stateless
Letʼs move the Java world!
Mule SOA - ESB App Failover
Mule ESB as Application Container 1
Service 1 Service 2 Service 3
Mule ESB as Application Container 2
Service 1 Service 2 Service 3
Load Balancer
External Applications
http://server.mycompany.com/service_call
http://mule_server_2/service_callhttp://mule_server_1/service_call
* A/A configuration uses the load balancer to dispatch service calls* The load balancer takes a failing service out of rotation automatically* Failure reason no. 1: network connectivity* Failure reason no. 2: Mule container* Failure reason no. 3: Service application bug
Letʼs move the Java world!
Uninterrupted Application Updates
Mule ESB as Application version 1.4 Mule ESB as Application version 1.4
Load Balancer
Mule ESB as Application version 2.0 Mule ESB as Application version 1.4
Load Balancer
Mule ESB as Application version 2.0 Mule ESB as Application version 1.4
Load Balancer
Mule ESB as Application version 2.0 Mule ESB as Application version 2.0
Load Balancer
* Allow stopping and deploying new application functionality without stopping services* Allow upgrades to a country's configuration without affecting other countries or stopping services
time
Letʼs move the Java world!
Database Replication
Partition 0
Primary Cluster
Node 0 Node 1
DB 0
DB 0b
Partition 1
DB 1
DB 1b
ESB as app services provider
Letʼs move the Java world!
Application Deployment
Mule 1 Mule 2 Mule 3 Mule 4
Load Balancer Load Balancer
Mule 5Failover
JMS Queuing Active JMS Queuing Active
Letʼs move the Java world!
Application Deployment
Linux
Virtual Machine
Java 6
JBoss
Application 1 Application 2
Linux
Virtual Machine
Java 6
Mule ESB Container
Web Service 1 Web Service 2
Linux
Virtual Machine
Java 6
MQ
Multi-Core Intel or AMD Processors
Simplify the architecture by having a common platform for all systems. This platform can be replicated across multiple data centers.
* Virtual Machine: VMware or Xen hosted on Windows; consider Amazon EC2 as a viable, low-cost alternative* Linux: Ubuntu Server* PowerBuilder applications (end-user) migrate to JBoss + Wicket or a similar configuration* All web services are hosted by Mule ESB* The Mule ESB and JBoss servers are separate from one another* MQ clusters have a similar architecture; JBoss messaging and Websphere MQ* Java 6 as a minimum
This architecture has a lower cost of operation and simplifies power consumption and administration.
Letʼs move the Java world!
Application Deployment
DiskDisk
SAN
Virtual Host (Intel, AMD) Virtual Host (Intel, AMD)
App Balancer
ServicesBalancer
Internet
Web ServicesActive
ApplicationActive
MQMaster
DistributedCache
Web ServicesActive
ApplicationActive
MQSlave
DistributedCache
Each data center will have a cluster of two or more physical systems.
Each system will virtually host two or more applications/environments deployed as described in the previous diagram.
The system is designed for horizontal scalability (more traffic, more virtual or physical servers.
The system has inherent fail-over built in.
App and service requestsmay come from the open Internet
Use physicalload balancers;can be Linux systemsor dedicated F5balancers - separate fromcluseter
Letʼs move the Java world!
Application DeploymentData Center Europe
App Cluster
App Cluster
Data Center US
App Cluster
App Cluster
Internet
Data Center Japan
App Cluster
App Cluster
Claims Mgmt
Claims Mgmt
Informix
Expert
Each data center has an application cluster
The app clusters have identical configurations; only the app itself may vary by locale
Designated data center also functions as the global services processing hub; all applications talk to this cluster (e.g. Claims Management) regardless of where the app calling them is from.
The global services clusters are separate physically and logically from the application clusters which may include locale-specific web services and data stores.
Legacy System
Legacy System
Legacy System
Letʼs move the Java world!
Application Deployment
Partition 0
Primary Cluster
Node 0 Node 1
DB 0
DB 0b
Partition 1
DB 1
DB 1b
ESB as app services provider
Partition 0
Secondary Cluster
Node 0 Node 1
DB 0
DB 0b
Partition 1
DB 1
DB 1b
ESB as app services provider
Enterprise Service Bus (routing, queuing, transformation, transactions, dispatching)
q u e u e
Letʼs move the Java world!
Q&AComments?
Anything else?
Eugene [email protected] - pr3d4t0r ##java, irc.freenode.net
http://ciurana.eu/scalablesystems
This presentation is available from: http://ciurana.eu/GeeCON-2010
Twitter: ciurana