Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event...
Transcript of Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event...
![Page 1: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/1.jpg)
Better Stream Processingwith PythonTaking the Hipster out of Streaming
Andreas Heider, Robert Wall12.07.2017 EuroPython
![Page 2: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/2.jpg)
Who are we?
• DevelopersatWinton
• Wintonisaglobalinvestmentmanagementanddatasciencecompany,foundedin1997
• Webelievethescientificmethodcanbeprofitablyappliedtothefieldofinvesting
2
![Page 3: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/3.jpg)
What do we mean by Stream processing?
3
Batch Stream
![Page 4: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/4.jpg)
Example: Real Time Financial Market Data
4
Time Symbol Price Qty
10:15:01 AAPL $144 10
10:15:02 GOOG $940 5
10:15:03 AAPL $145 11
…
Exchange10:15:02GOOG
5@$940
10:15:01AAPL
10@$144
Trades
![Page 5: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/5.jpg)
Stream processing: Binning
5
Time Symbol Price Qty
10:15:01 AAPL $144 10
10:15:02 GOOG $940 5
10:15:03 AAPL $145 11
…
BinningProcess
Time Symbol Avg.Price
Volume
10:15 AAPL $144.5 1300
10:15 GOOG $943 1250
10:16 AAPL $145.3 1450
…
![Page 6: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/6.jpg)
Streaming Data at Winton
6
EventStreams
EventStreams
MarketData
AlternativeData
Internal/BusinessEvents
Monitoring
Databases
RiskManagement
InvestmentManagement
Analytics
Transformations
Research
![Page 7: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/7.jpg)
Apache Kafka
7
Producer Consumer
Topic
Partition1
Partition2
Partition3
![Page 8: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/8.jpg)
Sprawl of Stream Processing systems
8
![Page 9: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/9.jpg)
Kafka Streams
9
• Simplelibrary,notaframework• Eventatatimestreamprocessing• Stateful processing,joinsandaggregations• Distributedprocessingandfaulttolerance• PartofmainApacheKafkaproject• Javaonlysofar:(
![Page 10: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/10.jpg)
Python at Winton
Manyusers,withdifferentskillsets:
• Developers
• Researchers
• Operations
• …
10
![Page 11: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/11.jpg)
Talking to Kafka using kafka-python
11
Hipster Stream Processing
![Page 12: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/12.jpg)
Python Kafka Clients
12
https://github.com/dpkp/kafka-python
• PurePythonimplementation
• Friendly,pythonic interface
https://github.com/confluentinc/confluent-kafka-python
• WrapperaroundClibrary• Amazinglyhighperformanceandrobustness
![Page 13: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/13.jpg)
Experiences using low-level client
13
• Whatstartsoutasa10linescriptendsupasyetanotherhomegrownstreamingframework
• Thedevilisinthedetails:• Guaranteeingatleastonce(orevenexactly-onceprocessing)• Handlingstateful processing• Distributingloadovervariousmachines• Microbatching• Handlingrebalancesnicely
![Page 14: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/14.jpg)
Kafka Streams for Python
https://github.com/wintoncode/winton-kafka-streams
14
![Page 15: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/15.jpg)
Demo
15
![Page 16: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/16.jpg)
Goals / Roadmap
1. CleanimplementationofKafka’scorestreamsAPIinPython
2. Experimentwithmorepythonic API/DSL
3. Optimise performanceviabatching/numpy/Arrow
4. ImplementmoreadvancedfeaturesofKafka’sstreamsAPI(exactlyonce,…)
16
![Page 17: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/17.jpg)
Get in touch!
• ProjectonGitHub:https://github.com/wintoncode/winton-kafka-streams
• Roadmap:https://github.com/wintoncode/winton-kafka-streams/blob/master/ROADMAP.md
• Announcementonkafka-dev
• Cometoourstandandtalktous
• ThankstoConfluent
17
![Page 18: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/18.jpg)
Questions?
• ProjectonGitHub:https://github.com/wintoncode/winton-kafka-streams
• Roadmap:https://github.com/wintoncode/winton-kafka-streams/blob/master/ROADMAP.md
• Announcementonkafka-dev
• Cometoourstandandtalktous
• ThankstoConfluent
18
![Page 19: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/19.jpg)
Backup
19
![Page 20: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •](https://reader035.fdocuments.in/reader035/viewer/2022081514/5edf6233ad6a402d666abb96/html5/thumbnails/20.jpg)
Some words of experience
• Noteverythingfitsthestreamingmodel
• Manuallychangingdataistricky• Becarefulwhatyouputin,haverecoverymethod
• Stabledeploymentcanbechallenging• EspeciallyZookeeperandbuggyclients
• Setupmonitoringfromthestart• WeusePrometheusandGrafana• https://github.com/yahoo/kafka-manager
20