Event Driven Architectures with Apache Kafka on Heroku
Transcript of Event Driven Architectures with Apache Kafka on Heroku
Event DrivenArchitectures with
Apache Kafka on Heroku
Chris Castle, Developer AdvocateRand Fitzpatrick, Director of Product
November 3, 2016
What problems does Apache Kafkasolve?
What are the core concepts of Kafka?
Why Apache Kafka on Heroku?
Forward-Looking StatementsStatement under the Private Securities Litigation Reform Act of 1995:
This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties
materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results
expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be
deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other
financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any
statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new
functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our
operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any
litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our
relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of
our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to
larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is
included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent
fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor
Information section of our Web site.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently
available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based
upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-
looking statements.
What problems doesApache Kafka solve?
Event-Driven Architecture
Event-driven architecture (EDA), also knownas message-driven architecture, is asoftware architecture pattern promoting theproduction, detection, consumption of, andreaction to events.
Source: Wikipedia
What Are Events?
Context
When was the event? (event time, process time)?
What produced the event? (causal history, device, etc)
Where did the event occur? (system location, geo location)
Operation
What function was applied? (create, update, delete, etc)
What are the characteristics of the function?
StateWhat is the data involved in the event?
How is that data identified?
"Contextualized operation on state"
Event ExamplesProduct viewsCompleted salesPage visitsSite loginsShipping notificationsInventory receivedIoT sensor valuesWeather dataTraffic dataTweetsElection polling data!
Completed sale2016-11-03T15:13:27Z
Retail www site
referrer Google search
Inventory item purchased
Amazon Echo, Black
$179.99
ID B00X4WHP5E
Context
Operation
State
Why Should I Care?
Scaling too slowly leads to dropped data
Overprovisioning leads to inefficient systems
Dataflow between processing stages requires coordination
Parallel pipelines with the same data can be nontrivial
Service discovery must support current and future processes
Sequencing service availability is critical to system function
Possible loss of state when individual services fail
Why Should I Care?Inbound Streams
Scaling too slowly leads to dropped dataOverprovisioning leads to inefficient systemsBackpressure and other coordination is hard!
Data Pipelines
Dataflow between processing stages requires coordinationParallel pipelines with the same data can be nontrivialProvenance and auditability!?
Microservices
Service discovery must support current and future processesSequencing service availability is critical to system functionPossible loss of state when individual services fail
Why Should I Care?Inbound Streams
Event streams in Kafka allow other resources to pull when readyResources can fail and reconnect without dropping eventsKafka provides elasticity, reducing the need for backpressure
Data Pipelines
Dataflow coordination is reduced via event stream structureThe immutability of data allows for trivial parallel processingTracking provenance and lineage of data becomes possible
Microservices
Services now only need to discover topics in KafkaService availability sequencing is relaxedInter-service communication is more robust
Use CasesHeroku Platform Event Stream
Learn more athttps://blog.heroku.com/powering-the-heroku-platform-api-a-distributed-systems-approach-using-streams-and-apache-kafka
Use CasesHeroku Operational Experience: App Metrics
Use CasesHeroku App Metrics
Learn more athttps://engineering.heroku.com/blogs/2016-05-26-heroku-metrics-there-and-back-again/
Use CasesTwitter Analytics Dashboard
Use Cases Generalized
Inbound Streams Data Pipelines Microservices
PlatformEvent Stream
App Metrics
Twitter Analytics
What are the coreconcepts of Kafka?
Apache Kafka Core Concepts
PRODUCERS CONSUMERS
Brokers
The instances running Kafka and managingstreams of events in a cluster.
Producers + Consumers
Clients that write to or read from a Kafkacluster.
Topics
Streams of events that are replicated acrossthe brokers. Configured with time basedretention or log compaction.
Partitions
Discrete subsets of topics, and importanttuning points for parallelism and ordering.
BROKER
TOPIC
PARTITION
Example ProducersProduct viewsCompleted salesPage visitsSite loginsShipping notificationsInventory receivedIoT dataWeather dataTraffic dataTweetsElection polling data!
Web serverPayment processorBrowserAuthentication serviceShipping providerWarehouseMotion sensorRain gaugeVehicle sensorTwitterOnline/phone survey
Personalization engineAccounting systemReporting dashboardSecurity audit serviceShipping providerInventory databaseActuatorClimate modelTraffic mapAnalytics dashboardElection forecast
Example ConsumersProduct viewsCompleted salesPage visitsSite loginsShipping notificationsInventory receivedIoT dataWeather dataTraffic dataTweetsElection polling data!
Complex Architecture
Complex Controls
TOPIC
PARTITION
Other Kafka primitives to provide structure to Kafka event streams
Retention
Log compaction
Replication factor
Delivery guarantees
Interacting with Kafka
and many more...
Kafka Connect
Some examples: HDFS, JDBC, Elasticsearch, Couchbase,Oracle, MS SQL Server, Cassandra, DynamoDB,
Salesforce Streaming API, Splunk
Image credit: Confluent Kafka Connect announcement blog post
Why Apache Kafkaon Heroku?
Without Heroku
Apache KafkaThe heart of the event management system, witha broad variety of configurations and options.
Apache ZookeeperThe system’s consensus and coordination clusteris vital for Kafka’s operation.
OS + JVM TuningTuning the cluster runtimes can be an art.
Instances + NetworkingPhysical or virtual, the infrastructure behindclusters must be well considered.
Myriad Moving Pieces
Apache Kafka on HerokuSimple Configuration
Apache Kafka on HerokuAutomated Operations
Apache Kafka on HerokuExperienced Staff
Self-HealingCurrent VersionNo-Downtime Upgrades
Heroku engineers have contributed patchesto the core open source Kafka project.
Apache Kafka on HerokuGlobal
US WestUS EastIrelandGermanyJapanSydney
Let's Review......and get you started with Kafka!
Apache Kafka is a valuable tool for building architectures to supportinbound event streams, data processing pipelines, and microservicescoordination. The primitives provided by Kafka -- topics, partitions, retentionduration, log compaction, and replication -- provide the tools tomanage structured event streams. Apache Kafka on Heroku simplifies operational complexity so thatany developer can get started quickly and feel confident that theirapplication is supported by a rock-solid, production service.
Get started athrku.co/use-kafka
Q&ARand Fitzpatrick, Director of Product
Chris Castle, Developer Advocate
But first, please take one minute to answer a fewquick questions so we can make webinars like this
even better for you.
Learn MoreApache Kafka on Heroku
Get Started
Documentation
Kafka Event Stream Modeling
Podcast: Managed Kafka with Heroku Engineer Tom Crayford
https://www.heroku.com/kafka
https://elements.heroku.com/addons/heroku-kafka
https://devcenter.heroku.com/articles/kafka-on-heroku
https://devcenter.heroku.com/articles/kafka-event-stream-modeling
http://softwareengineeringdaily.com/2016/10/25/managed-kafka-with-tom-crayford/
Thank you!