ASTRI Distributed Stream Computing...
Transcript of ASTRI Distributed Stream Computing...
ASTRI Proprietary
ASTRI Distributed Stream Computing Platform
Dr. Kent WuFeb. 1, 2013
2 ASTRI Proprietary
Introduction• Motivation
– A general-purpose, distributed, scalable, fault-tolerant, managed platform that allows programmers to easily develop applications for processing continuous unbounded streams of data.
– Fills the gap between complex proprietary systems and batch-oriented computing platforms, such as Hadoop.
• Applications– Real-time data analysis (financial data, twitter feeds, news data
…)– High-frequency trading– Network intrusion detection– Complex Event Processing– Real-time Search, Social Networks– …
3 ASTRI Proprietary
What is stream data?• Runner up Finalist Title – 2012 DEBS Grand Challenge Competition
– The 6th ACM International Conference On Distributed Event-Based Systems in Berlin, Germany
• Large Hi-tech Manufacturing Equipment Monitoring– 1000 equipment, 50 Terabytes every day, 5 Million events per second
• Competition Data– 18 non-consecutive days– Monitoring range of parameters and triggering conditions– 77,576,214 events total– 14.2 GB total data size
4 ASTRI Proprietary
Traditional Database Architecture
• Problem– “The high latency of the response (which currently
reaches 30 minutes) is the major factor increasing the severity of the KPI (Key Performance Indicators) violations and their direct monetary costs.”
5 ASTRI Proprietary
Stream-Based Architecture
• Challenges• Accuracy• Throughput• Latency
6 ASTRI Proprietary
Query 1
7 ASTRI Proprietary
Query 2
8 ASTRI Proprietary
Stream Processing System
Queries
Queries
Queries
Queries
Queries
Queries
Data
Data
Data
Data
Data
Data
Data
Data Stream of Event
9 ASTRI Proprietary
Stream Processing System
Queries
Queries
Queries
Queries
Queries
Queries
Data
Data
Data
Data
Data
Data
Data
Data Stream of Event
10 ASTRI Proprietary
Distributed Stream Processing System
Queries
Queries
Queries
Queries
Queries
Queries
Data
Data
Data
Data
Data
Data
Data
Data Stream of Event
node1
node2
nodeN
11 ASTRI Proprietary
Problems
• Scaling is painful• Poor fault-tolerance• Coding is tedious
12 ASTRI Proprietary
MapReduce
• Scalability to large data volumes:– Scan 100 TB on 1 node @ 50 MB/s = 24 days– Scan on 1000-node cluster = 35 minutes
• Cost-efficiency:– Commodity nodes (cheap, but unreliable)– Commodity network (low bandwidth)– Automatic fault-tolerance (fewer admins)– Easy to use (fewer programmers)
13 ASTRI Proprietary
MapReduce Programming Model
• Data type: key-value records• (key, value)
• Map function:Map(k1,v1) → list(k2,v2)
• Reduce function:Reduce(k2, list (v2)) → list(v3)
14 ASTRI Proprietary
Example: Word Count
• The prototypical MapReduce example counts the appearance of each word in a set of documents
function map(String name, String document):// name: document name// document: document contentsfor each word w in document:emit (w, 1)
function reduce(String word, Iterator partialCounts):// word: a word// partialCounts: a list of aggregated partial countssum = 0for each pc in partialCounts:sum += ParseInt(pc)
emit (word, sum)
15 ASTRI Proprietary
Word Count Execution
the quickbrown fox
the fox ate the mouse
how nowbrown cow
MapMap
MapMap
MapMap
ReduceReduce
ReduceReduce
brown, 2fox, 2how, 1now, 1the, 3
ate, 1cow, 1mouse, 1quick, 1
the, 1brown, 1fox, 1
quick, 1
the, 1fox, 1the, 1
how, 1now, 1brown, 1
ate, 1mouse, 1
cow, 1
Input Map Shuffle & Sort Reduce Output
16 ASTRI Proprietary
Twitter trending example
• Compute trendy Twitter topics by listening to the spritzer stream from the Twitter API
ExtractorPETweet
CountApp1
Adapter
CountPE1
…
CountPEn
CountApp2
CountPEn+1
…
CountPEn+m
TopNTopicPE
17 ASTRI Proprietary
Twitter-adapterpublic class TwitterInputAdapter extends LoadControlAdapterApp {
private LinkedBlockingQueue<Status> messageQueue = new LinkedBlockingQueue<Status>();
public void connectAndRead() throws Exception {…messageQueue.add(status);…
} public void initQueue() {
loadController.initBlockingQueue(messageQueue);}public void initLoadController() {
loadController = new EventQueueLoadController();}class Dequeuer implements Runnable {
public void run() {while (true) {
try {LoadControlObject lco = (LoadControlObject) messageQueue.take();Status status = (Status)lco.getObj();Event event = new Event();event.put("statusText", String.class, status.getText());getRemoteStream().put(event);
} catch (Exception e) {}
18 ASTRI Proprietary
Twitter-counterpublic class TopicExtractorPE extends LoadControlProcessingElement {
Streamable<Event> downStream;public void setDownStream(Streamable<Event> stream) {
this.downStream = stream;}public void onEvent(Event event) {
String text = event.get("statusText", String.class);Iterable<String> split = Splitter.on(' ').omitEmptyStrings().trimResults().split(text);for (String topic : split) {
if (!topic.startsWith("#")) {continue;
}String topicOnly = topic.substring(1);if (topicOnly.length() == 0 || topicOnly.contains("#")) {
continue;}super.throughMsgDropper(new TopicEvent(topicOnly, 1));
}}protected void eventProcessing(Object arg0) {
downStream.put((TopicEvent)arg0);}
}
19 ASTRI Proprietary
Twitter-counterpublic class TopicCountAndReportPE extends ProcessingElement {
transient Stream<TopicEvent> downStream;transient int threshold = 10;
public void setDownstream(Stream<TopicEvent> aggregatedTopicStream) {this.downStream = aggregatedTopicStream;
}
public void onEvent(TopicEvent event) {if (firstEvent) {
logger.info("Handling new topic [{}]", getId());firstEvent = false;
}count += event.getCount();
}
public void onTime() {if (count < threshold) {
return;}downStream.put(new TopicEvent(getId(), count));
}}
20 ASTRI Proprietary
Twitter-counterpublic class TopNTopicPE extends ProcessingElement {
Map<String, Integer> countedTopics = Maps.newHashMap();
public void onEvent(TopicEvent event) {countedTopics.put(event.getTopic(), event.getCount());
}
public void onTime() {TreeSet<TopNEntry> sortedTopics = Sets.newTreeSet();for (Map.Entry<String, Integer> topicCount : countedTopics.entrySet()) {
sortedTopics.add(new TopNEntry(topicCount.getKey(), topicCount.getValue()));}
StringBuilder sb = new StringBuilder();Iterator<TopNEntry> iterator = sortedTopics.iterator();
while (iterator.hasNext() && i < 10) {TopNEntry entry = iterator.next();sb.append("topic [" + entry.topic + "] count [" + entry.count + "]\n");i++;
}sb.append("\n");
//write sb into files}
}
21 ASTRI Proprietary
Twitter-counterpublic class TwitterCounterApp extends App {
protected void onInit() {//init TopNTopicPETopNTopicPE topNTopicPE = createPE(TopNTopicPE.class);topNTopicPE.setTimerInterval(10, TimeUnit.SECONDS);// we checkpoint this PE every 20stopNTopicPE.setCheckpointingConfig(new
CheckpointingConfig.Builder(CheckpointingMode.TIME).frequency(20).timeUnit(TimeUnit.SECONDS).build());
@SuppressWarnings("unchecked")Stream<TopicEvent> aggregatedTopicStream = createStream("AggregatedTopicSeen", new
KeyFinder<TopicEvent>() {
@Overridepublic List<String> get(final TopicEvent arg0) {
return ImmutableList.of("aggregationKey");}
}, topNTopicPE);
//init TopicCountAndReportPE…
//init TopicExtractorPE…
}}
22 ASTRI Proprietary
Twitter-counterpublic class TwitterCounterApp extends App {
protected void onInit() {//init TopNTopicPE…//init TopicCountAndReportPE
TopicCountAndReportPE topicCountAndReportPE = createPE(TopicCountAndReportPE.class);
topicCountAndReportPE.setDownstream(aggregatedTopicStream);topicCountAndReportPE.setTimerInterval(10, TimeUnit.SECONDS);// we checkpoint instances every 2 eventstopicCountAndReportPE.setCheckpointingConfig(new
CheckpointingConfig.Builder(CheckpointingMode.EVENT_COUNT).frequency(2).build());
Stream<TopicEvent> topicSeenStream = createStream("TopicSeen", new KeyFinder<TopicEvent>() {
@Overridepublic List<String> get(final TopicEvent arg0) {
return ImmutableList.of(arg0.getTopic());}
}, topicCountAndReportPE);
//init TopicExtractorPE…
}}
23 ASTRI Proprietary
Twitter-counterpublic class TwitterCounterApp extends LoadControlApp {
protected void onInit() {//init TopNTopicPE…//init TopicCountAndReportPE…
//init TopicExtractorPETopicExtractorPE topicExtractorPE = createPE(TopicExtractorPE.class);topicExtractorPE.setDownStream(topicSeenStream);topicExtractorPE.setSingleton(true);createInputStream("RawStatus", topicExtractorPE);
if(this.getHadoopApplicationId()!=null) { // enable LS & LB by AppMastersuper.initLoadController(topicExtractorPE);
}}
}
24 ASTRI Proprietary
An Example: High Frequency Algorithmic Trading
• CRA-INVESTIGATOR (Transaction Cost Analysis), Charles River Advisors Limited– One tick per 1ms per security (million ticks/events per second)– Complex data/event processing for HF algorithmic trading – Distributed computational security model – Fault tolerance / failover protection– Challenges:
• Regulations/risk management• Latency• Throughput• Scalability• Performance
GUI
Market Service
Database
Aggregation Analytics
Order & Execution Service
Algorithmic Trading Engine
Market Data Provider (Real-time / Historical)
Execution Analytics Market Analytics
25 ASTRI Proprietary
CRA-INVESTIGATOR (Transaction Cost Analysis)
26 ASTRI Proprietary
CRA-INVESTIGATOR (Transaction Cost Analysis)
27 ASTRI Proprietary
Conclusion
• Proven• Decentralized• Scalable / Elasticity• Extensible• Cluster management• Fault-tolerance• Load balancing• Guaranteed data processing• No intermediate message broker• Higher level abstraction than message transmission• “Just works”
28 ASTRI Proprietary28
Thank You