Flink September 2015 Community Update
-
Upload
robert-metzger -
Category
Technology
-
view
620 -
download
2
Transcript of Flink September 2015 Community Update
Berlin Apache Flink Meetup #11Community Update
September 2015
Robert MetzgerCommitter and PMC
@rmetzger_
2
Apache Flink is an open source platform for scalable batch and stream data processing.
Apache Flink is …
flink.apache.org
• The core of Flink is a distributed streaming dataflow engine.• Executing dataflows in
parallel on clusters• Providing a reliable
foundation for various workloads
• DataSet and DataStream programming abstractions are the foundation for user programs and higher layers
3
One engine for many use cases
flink.apache.org
Real time streaming topologies
Machine Learning at scale
Graph Analysis
Long batchpipelines
4
What happened?• New Committer: Matthias Sax• 0.9.1 released• Discussions for releasing 0.10 started• Cascading on Flink released: https://
github.com/dataArtisans/cascading-flink • Flink+NiFi integration pull request
opened
flink.apache.org
5
Now in master (0.10-SNAPSHOT)
flink.apache.org
• Flink dropped Hadoop 2.2.0 support (we require 2.3.0)• Scala 2.11 artifacts are now available• Support for allocating off-heap memory• New window operators (general purpose and processing
time windows)• old implementation: 50K / core / sec (gets slower over time, high
GC overhead)• new implementation w/o pre-aggregation: 800K / sec / core
(moderate GC overhead)• new implementation w/ pre-aggregation: 3mio / sec / core (low GC
overhead)• Rolling HDFS file sink for DataStream API• Sink for ElasticSearch• New JobManager dashboard• New FlinkKafkaProducer
6
Flink among “The best open source big data tools”
flink.apache.org
7
Articles• data Artisans blog: Kafka + Flink: A practical, how-
to guide [1]
• Gartner blog: Apache Flink Offers a Challenge to Spark [2]
• data Artisans blog: Batch is a special case of streaming [3]
• Flink blog: Off-heap Memory in Apache Flink and the curious JIT compiler [4]
• MapR blog: Apache Flink: A New Way to Handle Streaming Data [5]
• Big Data Knowledge Base: Happenings in the Flink Community - September 2015 [6]
[1] http://data-artisans.com/kafka-flink-a-practical-how-to/[2] http://blogs.gartner.com/nick-heudecker/apache-flink-offers-a-challenge-to-spark/[3] http://data-artisans.com/batch-is-a-special-case-of-streaming/[4] http://flink.apache.org/news/2015/09/16/off-heap-memory.html[5] https://www.mapr.com/blog/apache-flink-new-way-handle-streaming-data[6] http://sparkbigdata.com/102-spark-blog-slim-baltagi/17-happenings-in-the-flink-community-september-2015
8
Events in September
flink.apache.org
VLDB 2015 Conference Workshop
Flink Training in BerlinWashington
DC Meetup
Meetup in Belgium
Milwaukee Meetup
Budapest:2 ApacheCon
TalksBigTop Workshop
data2day Conference in Karlsruhe
Chicago Meetup
9flink.apache.org
10
GitHub stats
flink.apache.org
11flink.apache.org
Flink Forward: 2 days conference with free training in Berlin, Germany• Schedule: http://flink-forward.org/?
post_type=day