Flink September 2015 Community Update

Berlin Apache Flink Meetup #11Community Update

September 2015

Robert MetzgerCommitter and PMC

[email protected]

@rmetzger_

2

Apache Flink is an open source platform for scalable batch and stream data processing.

Apache Flink is …

flink.apache.org

• The core of Flink is a distributed streaming dataflow engine.• Executing dataflows in

parallel on clusters• Providing a reliable

foundation for various workloads

• DataSet and DataStream programming abstractions are the foundation for user programs and higher layers

3

One engine for many use cases

flink.apache.org

Real time streaming topologies

Machine Learning at scale

Graph Analysis

Long batchpipelines

4

What happened?• New Committer: Matthias Sax• 0.9.1 released• Discussions for releasing 0.10 started• Cascading on Flink released: https://

github.com/dataArtisans/cascading-flink • Flink+NiFi integration pull request

opened

flink.apache.org

https://github.com/dataArtisans/cascading-flink

https://github.com/dataArtisans/cascading-flink

5

Now in master (0.10-SNAPSHOT)

flink.apache.org

• Flink dropped Hadoop 2.2.0 support (we require 2.3.0)• Scala 2.11 artifacts are now available• Support for allocating off-heap memory• New window operators (general purpose and processing

time windows)• old implementation: 50K / core / sec (gets slower over time, high

GC overhead)• new implementation w/o pre-aggregation: 800K / sec / core

(moderate GC overhead)• new implementation w/ pre-aggregation: 3mio / sec / core (low GC

overhead)• Rolling HDFS file sink for DataStream API• Sink for ElasticSearch• New JobManager dashboard• New FlinkKafkaProducer

6

Flink among “The best open source big data tools”

flink.apache.org

7

Articles• data Artisans blog: Kafka + Flink: A practical, how-

to guide [1]

• Gartner blog: Apache Flink Offers a Challenge to Spark [2]

• data Artisans blog: Batch is a special case of streaming [3]

• Flink blog: Off-heap Memory in Apache Flink and the curious JIT compiler [4]

• MapR blog: Apache Flink: A New Way to Handle Streaming Data [5]

• Big Data Knowledge Base: Happenings in the Flink Community - September 2015 [6]

[1] http://data-artisans.com/kafka-flink-a-practical-how-to/[2] http://blogs.gartner.com/nick-heudecker/apache-flink-offers-a-challenge-to-spark/[3] http://data-artisans.com/batch-is-a-special-case-of-streaming/[4] http://flink.apache.org/news/2015/09/16/off-heap-memory.html[5] https://www.mapr.com/blog/apache-flink-new-way-handle-streaming-data[6] http://sparkbigdata.com/102-spark-blog-slim-baltagi/17-happenings-in-the-flink-community-september-2015

http://data-artisans.com/kafka-flink-a-practical-how-to/

http://data-artisans.com/kafka-flink-a-practical-how-to/

http://blogs.gartner.com/nick-heudecker/apache-flink-offers-a-challenge-to-spark/

http://blogs.gartner.com/nick-heudecker/apache-flink-offers-a-challenge-to-spark/

http://data-artisans.com/batch-is-a-special-case-of-streaming/

http://data-artisans.com/batch-is-a-special-case-of-streaming/

http://flink.apache.org/news/2015/09/16/off-heap-memory.html

http://flink.apache.org/news/2015/09/16/off-heap-memory.html

https://www.mapr.com/blog/apache-flink-new-way-handle-streaming-data

https://www.mapr.com/blog/apache-flink-new-way-handle-streaming-data

http://sparkbigdata.com/102-spark-blog-slim-baltagi/17-happenings-in-the-flink-community-september-2015



8

Events in September

flink.apache.org

VLDB 2015 Conference Workshop

Flink Training in BerlinWashington

DC Meetup

Meetup in Belgium

Milwaukee Meetup

Budapest:2 ApacheCon

TalksBigTop Workshop

data2day Conference in Karlsruhe

Chicago Meetup

9flink.apache.org

10

GitHub stats

flink.apache.org

11flink.apache.org

Flink Forward: 2 days conference with free training in Berlin, Germany• Schedule: http://flink-forward.org/?

post_type=day

http://flink-forward.org/?post_type=day



Flink September 2015 Community Update

Technology

Transcript of Flink September 2015 Community Update