What you get by replicating Lucene indexes on the Infinispan Data Grid (Berlin Buzzwords 2012)
Infinispan, a distributed in-memory key/value data grid and cache
-
Upload
sebastian-andrasoni -
Category
Software
-
view
193 -
download
4
Transcript of Infinispan, a distributed in-memory key/value data grid and cache
Infinispan
Distributed in-memory key/value data grid and cache
@infinispan
Agenda
• Introduction
• Part 1
• Hash Tables
• Distributed Hash Tables
• Consistent Hashing
• Chord Lookup Protocol
• Part 2
• Data Grids
• Infinispan
• Architecture
• Consistent Hashing / Split Clusters
• Other features
Part I – A (very) short introduction to distributed hash tables
Hash Tables
Source: Wikipedia http://commons.wikimedia.org/wiki/File:Hash_table_5_0_1_1_1_1_1_LL.svg#/media/File:Hash_table_5_0_1_1_1_1_1_LL.svg
Distributed Hash Tables (DHT)
Source: Wikipedia - http://commons.wikimedia.org/wiki/File:DHT_en.svg#/media/File:DHT_en.svg
• Decentralized Hash Table functionality
• Interface
• put(K,V)
• get(K) -> V
• Nodes can fail, join and leave
• The system has to scale
Distributed Hash Tables (DHT)
• Flooding in N nodes
• put() – store in any node O(1)
• get() – send query to all nodes O(N)
• Full replication in N nodes
• put() – store in all nodes O(N)
• get() – check any node O(1)
Simple solutions
Fixed Hashing
NodeID = hash(key) % TotalNodes.
Fixed Hashing with High Availability
NodeID = hash(key) % TotalNodes.
Fixed Hashing and Scalability
NodeID = hash(key) % TotalNodes+1.
2 Nodes, Key Space={0,1,2,3,4,5}
NodeID = hash(key) % 2.
NodeID = hash(key) % 3.
N0 (key mod 2 = 0) N1 (key mod 2 = 1)
0,2,4 1,3,5
N0 (key mod 3 = 0) N1 (key mod 3 = 1) N2 (key mod 3 = 2)
0,3 1,4 2,5
Consistent Hashing
Consistent Hashing – The Hash Ring
0
N0
N1
N2
K1
K2
K3
K4
K5
K6
Consistent Hashing – Nodes Joining, Leaving
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
Chord: Peer-to-peer Lookup Protocol
• Load Balance – distributed hash function, spreading keys evenly over nodes
• Decentralization – fully distributed no SPOF
• Scalability – logarithmic growth of lookup cost with the number of nodes, large systems are feasible
• Availability – automatically adjusts its internal tables to ensure the node responsible for a key is always found
• Flexible naming – key-space is flat (flexibility in how to map names to keys)
Chord – Lookup O(N)
Source: Chord: A Scalable Peer-to-peer Lookup Protocol for Internet ApplicationsIon Stoica , Robert Morrisz, David Liben-Nowellz, David R. Kargerz, M. Frans Kaashoekz, Frank Dabekz, Hari Balakrishnanz
Chord – Lookup O(logN)
Source: Chord: A Scalable Peer-to-peer Lookup Protocol for Internet ApplicationsIon Stoica , Robert Morrisz, David Liben-Nowellz, David R. Kargerz, M. Frans Kaashoekz, Frank Dabekz, Hari Balakrishnanz
• K=6 (0, 26−1)
• Finger[i] = first node that succeeds
(N+ 2𝑖−1) mod 2K , where 1 ≤ 𝑖 ≤ 𝐾
• Successor/Predecessor – the next/previous node on circle
Chord – Node Join
Source: Chord: A Scalable Peer-to-peer Lookup Protocol for Internet ApplicationsIon Stoica , Robert Morrisz, David Liben-Nowellz, David R. Kargerz, M. Frans Kaashoekz, Frank Dabekz, Hari Balakrishnanz
• Node 26 joins the system between nodes 21 and 32.
• (a) Initial state: node 21 points to node 32;
• (b) node 26 finds its successor (i.e., node 32) and points to it;
• (c) node 26 copies all keys less than 26 from node 32;
• (d) the stabilize procedure updates the successor of node 21 to node 26.
• CAN (Hypercube), Chord (Ring), Pastry (Tree+Ring), Tapestry (Tree+Ring), Viceroy, Kademlia, Skipnet, Symphony (Ring), Koorde, Apocrypha, Land, Bamboo, ORDI …
The world of DHTs …
Part II – A short introduction to Infinispan
Where do we store data?One size does not fit all...
Infinispan – History
• 2002 – JBoss App Server needed a clustered solution forHTTP and EJB session state replication for HA clusters.JGroups (open source group communication suite) had areplicated map demo, expanded to a tree data structure,added eviction and JTA transactions.
• 2003 – this was moved to JBoss AS code base
• 2005 – JBoss Cache was extracted and became a standalone project
… JBoss Cache evolved into Infinispan, core parts redesigned
• 2009 – JBoss Cache 3.2 and Infinispan 4.0.0.ALPHA1 was released
• 2015 - 7.2.0.Alpha1
• Check the Infinispan RoadMap for more details
Code?
<dependency>
<groupId>org.infinispan</groupId>
<artifactId>infinispan-embedded</artifactId>
<version>7.1.0.Final</version>
</dependency>
EmbeddedCacheManager cacheManager = new DefaultCacheManager();
Cache<String,String> cache = cacheManager.getCache();
cache.put("Hello", "World!");
Usage Modes
• Embedded / library mode
• clustering for apps and frameworks (e.g. JBosssession replication)
• Local mode single cache
• JSR 107: JCACHE - Java Temporary Caching API
• Transactional local cache
• Eviction, expiration, write through, write behind, preloading, notifications, statistics
• Cluster of caches
• Invalidation, Hibernate 2nd level cache
• Server mode – remote data store
• REST, MemCached, HotRod, WebSocket (*)
Code?
Configuration config = new ConfigurationBuilder()
.clustering()
.cacheMode(CacheMode.DIST_SYNC)
.sync()
.l1().lifespan(25000L)
.hash().numSegments(100).numOwners(3)
.build();
Configuration config = new ConfigurationBuilder()
.eviction()
.maxEntries(20000).strategy(EvictionStrategy.LRU)
.expiration()
.wakeUpInterval(5000L)
.maxIdle(120000L)
.build();
Infinispan – Core Architecture
Remote App 1 (C++) Remote App 2 (Java) Remote App 3 (.NET)
Network (TCP)
Node (JVM)
MemCached, HotRod, REST, WebSocket (*)
Embedded App (Java)
Transport (JGroups)
NotificationTransactions / XA
QueryMap / Reduce
Monitoring
Storage Engine(RAM +
Overflow)
Node (JVM)
MemCached, HotRod, REST, WebSocket (*)
Embedded App (Java)
Transport (JGroups)
NotificationTransactions / XA
QueryMap / Reduce
Monitoring
Storage Engine(RAM +
Overflow)
TCP/UDP
Infinispan Clustering and Consistent Hashing
• JGroups Views
• Each node has a unique address
• View changes when nodes join, leave
• Keys are hashed using MurmurHash3 algorithm
• Hash Space is divided into segments
• Key > Segment > Owners
• Primary and Backup Owners
Does it scale?
• 320 nodes, 3000 caches, 20 TB RAM
• Largest cluster formed: 1000 nodes
Empty Cluster
CLUSTER
Add 1 Entry
CLUSTER
K1
Primary and Backup
CLUSTER
K1
K1
Add another one
CLUSTER
K1
K1
K2
Primary And Backup
CLUSTER
K1
K1
K2K2
A cluster with more keys
CLUSTER
K1
K1
K2K2
K3
K3K4
K4
K5
K5
A node dies…
CLUSTER
K1
K1
K2K2
K3
K3K4
K4
K5
K5
The cluster heals
CLUSTER
K1
K1
K2K2
K3 K3
K4
K4
K5
K5
If multiple nodes fail…
• CAP Theorem to the rescue:
• Formulated by Eric Brewer in 1998
• C - Consistency
• A - High Availability
• P - Tolerance to Network Partitions
• Can only satisfy 2 at the same time:
• Consistency + Availability: The Ideal World where network partitions do not exist
• Partitioning + Availability: Data might be different between partitions
• Partitioning + Consistency: Do not corrupt data!
Infinispan Partition Handling Strategies
• In the presence of network partitions
• Prefer availability (partition handling DISABLED)
• Prefer consistency (partition handling ENABLED)
• Split Detection with partition handling ENABLED:
• Ensure stable topology
• LOST > numOwners OR no simple majority
• Check segment ownership
• Mark partition as Available / Degraded
• Send PartitionStatusChangedEvent to listeners
Cluster Partitioning – No data lost
K1
K1
K2K2
K3
K3K4
K4
K5
K5
Partition1 Partition2
Cluster Partitioning – Lost data
K1
K1
K2K2
K3
K3K4
K4
K5
K5
Partition1
Partition2
Merging Split Clusters
• Split Clusters see each other again
• Step1: Ensure stable topology
• Step2: Automatic: based on partition state
• 1 Available -> attempt merge
• All Degraded -> attempt merge
• Step3: Manual
• Data was lost
• Custom listener on Merge
• Application decides
Querying Infinispan
• Apache Lucene Index
• Native Query API (Query DSL)
• Hibernate Search and Apache Lucene to index and search
• Native Map/Reduce
• Index-less
• Distributed Execution Framework
• Hadoop Integration (WIP)
• Run existing map/reduce jobs on Infinispan data
Map Reduce:
MapReduceTask<String, String, String, Integer> mapReduceTask
= new MapReduceTask<>(wordCache);
mapReduceTask
.mappedWith(new WordCountMapper())
.reducedWith(new WordCountReducer());
Map<String, Integer> wordCountMap = mapReduceTask.execute();
Query DSL:
QueryParser qp = new QueryParser("default", new
StandardAnalyzer());
Query luceneQ = qp
.parse("+station.name:airport +year:2014 +month:12
+(avgTemp < 0)");
CacheQuery cq = Search.getSearchManager(cache)
.getQuery(luceneQ, DaySummary.class);
List<Object> results = query.list();
Other features
• JMX Management
• RHQ (JBoss Enterprise Management Solution)
• CDI Support
• JSR 107 (JCACHE) integration
• Custom interceptors
• Runs on Amazon Web Services Platform
• Command line client
• JTA with JBoss TM, Bitronix, Atomikos
• GridFS (experimental API), CloudTM, Cross Site Replication
DEMO
Q & A
Thank you!
Resources:
http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changedhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://pdos.csail.mit.edu/papers/ton:chord/paper-ton.pdfhttp://www.martinbroadhurst.com/Consistent-Hash-Ring.html
http://infinispan.org/docs/7.2.x/user_guide/user_guide.htmlhttps://github.com/infinispan/infinispan/wiki