Towards Improved Data Dissemination of Publish-Subscribe Systems
-
Upload
srinath-perera -
Category
Technology
-
view
1.278 -
download
2
description
Transcript of Towards Improved Data Dissemination of Publish-Subscribe Systems
![Page 1: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/1.jpg)
Towards Improved Data Dissemination
of Publish-Subscribe Systems
Srinath Perera, Ramith JayasingheDinesh Gamage
Lanka Software Foundation
![Page 2: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/2.jpg)
Outline
● Outline of Pub/sub Paradigm● 3 challenges ● Avoiding Blocking IO● Avoiding Message accumulation through parallel
message delivery● Working around Slow and unreliable consumers/
publishers● OGCE workflow suite● Conclusions
![Page 3: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/3.jpg)
Publish/ Subscribe Paradigm and Message Broker
● Many Event sources that generate events● Subscribers notify their interest through
subscriptions ● Broker Matches and deliver events to
Subscribers
![Page 4: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/4.jpg)
Motivating Usecase
![Page 5: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/5.jpg)
Goals of Message Broker
● High throughput ● Preserving Message Order for messages
generated from same event source● Reduce publish to delivery time
![Page 6: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/6.jpg)
Basic Message Broker Architecture
![Page 7: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/7.jpg)
Goals of this Paper
● Three architectural challenges● Blocking IO● Parallel Message Delivery ● Unreliable Consumers
● We explore architectural options in addressing these challenges
● We have improved OGCE Messenger based on our observations and used that the test bed for this study.
![Page 8: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/8.jpg)
Challenge 1: Blocking IO
● Blocking IO assigns each request to a thread. Then the number of parallel clients are limited by number of threads.
● Potential alternative is non-blocking IO, which uses an event based model and minimize the thread blocking due to IO.
● Message broker has an IO dominated workload, and therefore we believed an non-blocking approach can provide major improvements.
● We took advantage of Axis2 supports a pluggable transport architecture and setup the broker with NIO – transport from Apache
![Page 9: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/9.jpg)
Experimental setup● Loaded system with XML messages, over a
constant set of subscriptions● Loaded the system with out loading the
network.● Statistics were calculated periodically (e.g. 2
seconds)● 10 topics, 1000 messages per topic, 200
consumers
![Page 10: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/10.jpg)
NIO vs. Blocking IO
● NIO transport increases the throughput of the system.
● NIO is able to handle more concurrent connections (publishers) with less resources.
![Page 11: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/11.jpg)
Challenge 2: Message Accumulation
● If message reception rate is lower than message dissemination rate,
● => messages accumulate => system slows down and crash
● Often one incoming message need to be delivered to multiple consumers. With high number of Consumers, very high chance of above problem.
● Single delivery thread could pause major limitations.
● But Naive Parallel solution will break the order of message delivery
![Page 12: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/12.jpg)
Parallel Message Delivery
● Two parallelization strategies considered● Topic based
– Each thread is assigned a set of Topic or Xpath Expressions
– Thread will deliver a message to Consumers if it matches to a topic/Xpath handled by it.
● Consumer (EPR) based – Each active consumer is assigned a standing job and
a message queue– Job delivers messages accumulated in message
queues
![Page 13: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/13.jpg)
Topic Based Message Delivery
● Queue for each topic, and matching messages for the topic are place in the queue.
● A thread assigned to each queue pick up messages and delivers.
● Concerns ● If a message matches multiple subscriptions submitted
by the same consumer, it will be delivered multiple times. ● If two subscriptions for same consumer is handled by
different threads, how can system preserve order of messages?
![Page 14: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/14.jpg)
Consumer (EPR) based parallelization
● There is a queue for each consumer.
● Messages for that consumer are placed on that queue.
● A thread assigned to the queue delivers messages in the queue to the consumer.
● Facts● Since only thread delivers messages to a consumer, order is
preserved. ● Jobs and queue are created only when messages are
available for a consumer.● Queue eventually expires when messages are not available
![Page 15: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/15.jpg)
Scheduling Standing jobs
● We use read-write locks to maximize concurrency
● We can not assign a static thread to each consumer as that will not scale with large number of consumers => we use a thread pool.
● Dynamic Thread pool● Standing Job will drain the queue entirely and try deliver
● Will not release the control as long as messages are in queue.
● Potential starvation
● Static Thread pool (size is configurable)● Each thread will iterate over standing jobs assigned to it.
● Each thread will drain the queue partially (configurable)
● Standing job will release control after delivering drained messages
● Optimized system by allowing greater concurrency for message filtering
![Page 16: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/16.jpg)
Performance: Throughput
![Page 17: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/17.jpg)
Performance: Round Trip Time
![Page 18: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/18.jpg)
Performance: Summary● Both parallel implementation performance better
than serial ( throughput & round trip time)● Static thread pool reports better round trip times.● Dynamic thread pool increases throughput
![Page 19: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/19.jpg)
Slow and Unreliable Consumers/Publishers
● Distributed in a heterogeneous environment and unpredictable (beyond the control of the system)
● Effects the performance of middleware● Round trip time● Throughput
● Delay incurred by delivering to slow consumers will be propagated to fast consumers as well.● E.g. connection timeouts will block the thread until
time out period
![Page 20: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/20.jpg)
Solution 1: Soft State Subscriptions
● Forcing consumers to renew ● But the problem persists until the timeout
happens.
![Page 21: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/21.jpg)
Solution 2: Blacklisting Schema
● Consumers are uniquely identified by their EPR● If message delivery fails (e.g. times-out)
repeatedly for a consumer – it will be blacklisted● System doesn’t try to send subsequent messages
to black-listed consumers (for a configurable time period)
● Facts ● Minimize the overhead incurred by message delivery
failures
![Page 22: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/22.jpg)
WS-Messenger
● Part of NSF funded “Open Grid Computing Environments – OGCE” project and Full Opensource.
● Implements WS-Eventing and WS-Notifications.
● Supports Topic based and Xpath Based Subscriptions
● New version works on Axis2 http://ws.apache.org/axis2/
● Multiple deployment options● Standalone distribution● Embedded in Servlet Container ( e.g. Tomcat)
![Page 23: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/23.jpg)
Future Directions
● Improved static thread pool based parallelization
● Ensure equal thread utilization ● Implement a work stealing mechanism or hand
over jobs to idle threads.
● Analyze performance impact on parallelization strategies when number consumers are increased.● Memory requirements ● Throughput, round trip time
![Page 24: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/24.jpg)
Important Info
● Open Grid Computing Environments (www.collab-ogce.org) provides a SOA based workflow suite for scientific use cases.
● WS-Messenger ● http://www.collab-
ogce.org/ogce/index.php/Messaging● http://www.collab-
ogce.org/ogce/index.php/Messaging_User_Guide
![Page 25: Towards Improved Data Dissemination of Publish-Subscribe Systems](https://reader033.fdocuments.in/reader033/viewer/2022051514/54b6d9f94a7959ca538b4694/html5/thumbnails/25.jpg)
Questions?