Clustering In The Wild

Sergio Bossa ([email protected]) - Pronetics/SourcesenseUgo Landini ([email protected]) - Pronetics/Sourcesense

Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License

Clustering in the wild

● Ugo Landini

– CTO, Sourcesense● Sergio Bossa

– Software Architect, Sourcesense

mailto:[email protected]



Agenda

● Why Clustering?

● Clustering J(2)EE

● Terracotta in a nutshell.

● Jira clustering issues.

– Files and indexes.

– Stateful applications and home grown caches.

– Thread and services.

– HTTP Session.● Summary.




Why clustering?

● Horizontal scalability:

– Scale out.

– More computers, to improve throughput when a single one is not enough or costs too much.

● High availability:

– More computers to improve uptime.

– If you unplug a network cable, the system should remain up and running.

– 24/7, or around.

– Usually more important than scalability.




Clustering J(2)EE

● In an ideal world

– <distributable /> tag in your web.xml

– Serializable objects in your HTTP session.● True, if and only if is J(2)EE Compliant

– Basically, no arbitrary use of resources and state● Files.● Threads.● Sockets.● ... ?




Clustering J(2)EE

● What do I do with my files?

– java.io.tmpdir

– JNDI lookup● What do I do with the state of my application (caches,

conversational state, etc.)?

– Stateful Enterprise Java Beans

– Well established caching frameworks ● EHCache, OSCache, JbossCache● JSR 107




Clustering J(2)EE

● What do I do with my thread/services?

– JMS (MDBs and topics, mostly)

– Commonj (Bea and IBM effort)● What do I do with my HTTP Session?

– Serializable objects.

– Use a good Load Balancer.




Wake up!

● Almost all successful J(2)EE applications around won't pass the Sun AVK (Application Verification Kit).

● Most people go straight for the simple solution

– and that one could be a cluster antipattern

– home grown caches, lucene indexes, quartz jobs, singletons... add your favourite quickie here.




Enter Terracotta

● Transparent (Translucid? ...) Clustering.

– Very few changes to already existent code.

– Low development effort.● Open Source, free for any use.

● Emerging (and cool!) technology.

● Did I mention that we are Terracotta partner? :)




The quest for antipatterns

● Jira is NOT easily clusterable, so it is a nice testbed.

● Jira is a bug tracking, issue tracking, and project management application developed to make this process easier.

● Jira is the leading issue tracker in the open source world (though it is not strictly open source).

● People is asking for a clustered Jira!

– http://jira.atlassian.com/browse/JRA-7330● Did I mention that we are Atlassian partner?


http://jira.atlassian.com/browse/JRA-7330



Terracotta magic




Terracotta magic

● Terracotta moves around the bytes changed in shared objects

– No serialization.

– superstatic objects!

– same semantic, only new() behaves differently● Demarcation of transaction with guarded block

– essentially moves multi-thread application semantic to cluster level.

● For performance reasons, for certain objects it moves behaviour and not data (logicaly managed vs physically managed objects)

– you can do the same thing if you need to. (distributed methods)




Terracotta in a nutshell

● Features, part one:

– Transparent JVM-level clustering.● Transparently works inside your JVM as an infrastructure

service.● Plugs into your code thanks to bytecode injection.● No API, no code changes!

– Hub-and-Spoke architecture.● Central server based architecture.● All nodes talk only to the central server.● Linear scalability.● No split-brain problem.





● Features, part two:

– Active/Passive mode.● One central active server, n passive servers.

– Network Attached Memory.● Shares your objects graph with the central server.● Virtual Heap (on disk, with Berkeley DB)● Maintains your object graph in the memory heap.

– Preserved Java semantics.● Object equality (equals, hashCode)● Concurrency. (syncronized, java.util.concurrency)● Thread communication. (wait, notify)





● Main concepts:

– Roots.● Defines where your shared objects graph starts.

– Locks.● Ensures data consistency.● Enables Terracotta intra-node communication.● All code changing parts of the shared objects graph must

be guarded by locks.– Distributed methods.

● Enables plain old Java methods to be simultaneously called in all cluster nodes.




Out in the wild

How did we actually cluster the beast?




Clustering Lucene indexes : Problems

● Lucene indexes are typically stored in files.

– Do you remember? clustering antipattern● Used to improve data access speed.

● How to cluster them?

– Network based solution : SAN or NFS.● Not a viable solution due to locks

– Messaging based solution : JMS● Complicated!● Indexes should improve performances, rather than make

them worse!




Clustering Lucene indexes : Solution

● Let's store indexes in memory!

● Lucene:

– Provides support for memory-based indexes.

– Just use org.apache.lucene.store.RAMDirectory.● Terracotta:

– Just a matter of configuration.

– And you can share your lucene indexes.




Clustering Jira caches : Problems

● Guess what ... Jira uses home grown caches!

– Do you remember? clustering antipattern

– From bad to worse:● No unified API!

– Just a lot of HashMaps and HashSets.● Very poor locking policies.

– Makes configuration-only Terracotta clustering impossible!

– Unfeasible to use an already existent caching framework.




Clustering Jira caches : Solution

● Write a new, ad-hoc, unified caching API.

● Goals:

– Simplicity.● As simple as using an HashMap.

– Thread safety.● Cache consistency.● Terracotta ready.

– Efficiency.● No bottlenecks.● No liveness failures.




Caching API :Striving for simplicity.

● No strange methods. No cluster related configuration.

– Just the usual GET/PUT methods, and alike.

– Terracotta makes the clustering work!● When choosing how to cluster the cache:

– Distribute behaviour, rather than data.● Jira puts heavyweight objects in cache.

– Distribute cache invalidation, rather than cache updates.● Lower hit ratio but ...

– Lower network traffic!– Higher simplicity!




Caching API :Striving for thread safety.

● Carefully use Java locks (ok, this was obvious ...).

● Due to how Jira works:

– The caching API must be able to group more than one cache under the same lock.

– The caching API must be able to execute a code block atomically under the same lock.

– Not so obvious ...● Use what we call “owner based locking.”




Caching API :Striving for efficiency.

● Choose the right balance between too fine grained and too coarse grained locks.

– Do not use complex lock constructs.● Use plain synchronized blocks.

– Use lock striping techniques.




Threads and services

● Jira periodically triggers threads:

– Do you remember? clustering antipattern● Threaded Jira services:

– Mail sending.

– Backup export.

– Index optimization




Clustering threads and services :Problems

● Threads cannot be clustered.

● We have to cluster the launched services.

– Some services must be shared among cluster nodes.

– Other services must be distributed.

– How to distinguish them?




Clustering threads and services :Solution

● Shared services.

– Clustered through Terracotta XML configuration.

– A shared service is executed only on a single node.

– The default.● Distributed services.

– Distributed through Terracotta XML configuration.

– A distributed service is executed on every node.

– Just implement com.atlassian.jira.service.JiraDistributedService




HTTP Session

● Two choices:

– Cluster it through Terracotta.● Very hard.

– Again, Jira puts a lot of heavyweight objects into session.

– Leave it unclustered.● Use a load balancer with sticky sessions enabled.

– Jira is not a mission critical application.– More simplicity, less complexity.

● Guess what we chose ...

– Please give me that shiny new load balancer ...




Dealing with external code

● Applications are often pluggable.

● Jira has a rich plugin architecture.

● External plugins must fit and work into the cluster

– It is necessary to provide simple APIs or configuration options for making cluster-ready plugins.

● Practical example : com.atlassian.jira.service.JiraDistributedService




Toward an end

Conclusions




Summary

● Terracotta is a transparent clustering solution but ...

– You have to take a lot of decisions and trade-off.● If you have to access files in a clustered environment:

– Slow access: network filesystem, database system.

– Fast access: use Terracotta network attached memory.● If you have to cluster your application state:

– Carefully make it thread safe.

– Choose between distributing data or behaviour.




Summary

● If you have application services:

– Choose services to share.● A shared service runs once per cluster.

– Choose services to distribute.● A distributed service runs once per node.

● If you have to cluster the HTTP session state:

– Consider not to cluster it!● If you have to deal with application plugins:

– Provide API hooks or configuration options.




Terracotta + Jira = Scarlet

● Scarlet.

– Clusters Jira through Terracotta.

– Published as a Jira extension.● http://confluence.atlassian.com/x/woQuBg

– Open Source.● We want you!

– Actively developed:● November 06, 2007 : 1.0 Beta 1.● Very soon : 1.0 Beta 2.


http://confluence.atlassian.com/x/woQuBg



The end

Q&A


Clustering In The Wild

Technology

Transcript of Clustering In The Wild