Scale Out and Conquer: Architectural Decisions Behind Distributed...
Transcript of Scale Out and Conquer: Architectural Decisions Behind Distributed...
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Scale Out and Conquer: Architectural
Decisions Behind Distributed In-Memory
Systems
Denis Magda
Apache Ignite PMC Chair
GridGain Product Management
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Agenda
• Partitioning – Pitfalls of Even Distribution
• Affinity Collocation – Laid Out in Numbers
• Multi-threading of Distributed Systems – Things to Know
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Memory-centric distributed
database, caching, and processing platform
Memory-centric platform based on Apache Ignite
adds enterprise features and support
for mission critical deployments
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Among Top 5 Apache Projects
Total number of Apache Projects: 318Top-Level Apache Projects: 193
Top 5 Apache Projects by Commits
1. Hadoop2. Ambari3. Lucene-Solr4. Camel5. Ignite
Top 10 most active Apache mailing lists 1. Flex2. Lucene3. Ignite4. Kafka5. Geode6. Flink7. Tomcat8. Cassandra9. Beam10. Sentry
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Partitioning – Pitfalls of Even Distribution
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Where request goes?
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Affinity Function
Key Partition
Server Node
ON-DISK
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Affinity Function
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Naïve Affinity
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Naïve Function: New Node
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Naïve Function: What’s Wrong?
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Naïve Affinity: Redundant Partitions Re-shuffling!
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Productized Affinity Functions
• Consistent hashing [1]
• Rendezvous hashing (HRW) [2]
[1] https://en.wikipedia.org/wiki/Consistent_hashing
[2] https://en.wikipedia.org/wiki/Rendezvous_hashing
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Rendezvous Affinity: Weight Calculation
WEIGHT(partition, node)
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Rendezvous Affinity: Weight Calculation
WEIGHT(partition, node)
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Rendezvous Affinity: Weight Calculation
WEIGHT(partition, node)
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Rendezvous Affinity: Two Nodes to Three
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Rendezvous Affinity: Distribute Evenly at Best
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Rendezvous Affinity: Possible Distribution
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Affinity Collocation – Laid Out in Numbers
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Running Transaction: No Collocation
1: class Account {
2: long id;
3: long city;
4: }
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Running Transaction: No Collocation
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Non Collocated Data: Number of Operations
2 (2 nodes)
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Non Collocated Data: Number of Operations
2 (2 nodes)
2 (primary + backup)
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Non Collocated Data: Number of Operations
2 (2 nodes)
2 (primary + backup)
2 (two-phase commit)
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Non Collocated Data: Number of Operations
2 (2 nodes)
2 (primary + backup)
2 (two-phase commit)
2 (request-response)
--
16 network operations
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Network Operations!
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Running Transaction: Collocated Data
1: class Account{
2: long id;
3:
4: @AffinityKeyMapped
5: long city;
6: }
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Running Transaction: Collocated Data!
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Collocated Data: Number of Operations
1 (1 node)
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Collocated Data: Number of Operations
1 (1 node)
2 (primary + backup)
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Collocated Data: Number of Operations
1 (1 node)
2 (primary + backup)
1 (one-phase commit)
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Collocated Data: Number of Operations
1 (1 node)
2 (primary + backup)
1 (one-phase commit)
2 (request-response)
--
4 network operations
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Let’s Run an SQL Query – No Collocation
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Executing SQL: Full Scan
• 1/3x latency
• 3x throughput
© 2018 GridGain Systems, Inc. GridGain Company Confidential
What About Indexes?
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Index Search Complexity
log 1_000_000 ≈ 20
vs.
log 333_333 ≈ 18
log 333_333 ≈ 18
log 333_333 ≈ 18
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Executing SQL: Indexes Search
• ~same latency
• ~same throughput
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Let’s Collocate and Run SQL Again
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Executing SQL: Indexed Search and Collocation
• same latency
• 3x throughput
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Multi-threading of Distributed Systems – Things to Know
© 2018 GridGain Systems, Inc. GridGain Company Confidential
How Requests/Tasks/Queries Processed?
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Requests Processing: Bottleneck!
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Making Situation Worse
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Cluster Node Chokes!
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Tread-Per-Partition and Workers
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Thread-Per-Partition and I/O Queues
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Thread-Per-Partition: Not a Magic Bullet
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Thread-Per-Partition: Not a Magic Bullet
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Upshot
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Upshot
• Partitioning – it’s not about even distribution only
• Affinity Collocation – a golden concept of distributed systems
• Business Data Model should be adopted/reconsidered
• Thread-per-partition – speeds up most but not all usage scenarios
© 2018 GridGain Systems, Inc. GridGain Company Confidential
These Days in New York
Docker NYC Meetup, Tomorrow, 6:00 PM
• Location: PeerSpace, 1740 Broadway, 15th Floor
• Distributed Database DevOps Dilemmas?
Kubernetes to the rescue
https://ignite.apache.org/events.html
© 2018 GridGain Systems, Inc. GridGain Company Confidential
Any Questions?
#apacheignite
#TheASF
Thank you for joining us. Follow the conversation.https://ignite.apache.org
#gridgain
#denismagda
https://www.meetup.com/NYC-In-Memory-Computing-Meetup/