Microservice Architecture on AWS using AWS Lambda and Docker Containers
Unified Log London: Analytics on write with AWS Lambda
-
Upload
alexander-dean -
Category
Data & Analytics
-
view
1.352 -
download
0
Transcript of Unified Log London: Analytics on write with AWS Lambda
![Page 1: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/1.jpg)
Analytics on write with AWS Lambda
Unified Log London, 4th November 2015
![Page 2: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/2.jpg)
Introducing myself
• Alex Dean
• Co-founder and technical lead at Snowplow, the open-source event analytics platform based here in London [1]
• Weekend writer of Unified Log Processing, available on the Manning Early Access Program [2]
[1] https://github.com/snowplow/snowplow
[2] http://manning.com/dean
![Page 3: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/3.jpg)
Analytics on read, analytics on write
![Page 4: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/4.jpg)
It’s easier to start by explaining analytics on read, which is much more widely practised and understood
1. Write all of our events to some kind of event store
2. Read the events from our event store to perform some analysis
![Page 5: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/5.jpg)
In analytics on write, the analysis is performed on the events in-stream (i.e. before reaching storage)
• Read our events from our event stream
• Analyze our events using a stream processing framework
• Write the summarized output of our analysis to some storage target
• Serve the summarized output into real-time dashboards, reports etc
![Page 6: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/6.jpg)
Analytics on write and analytics on read are good at different things, and leverage different technologies
![Page 7: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/7.jpg)
With a unified log powered by Kafka or Kinesis, you can apply both analytical approaches to your event stream
• Apache Kafka and Amazon Kinesis make it easy to have multiple consuming apps on the same event stream
• Each consuming app can maintain its own “cursor position” on the stream
![Page 8: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/8.jpg)
Getting started with analytics on write
![Page 9: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/9.jpg)
What are some good use cases for getting started with analytics on write?
Low-latency operational reporting, which must be fed from the incoming event streams in as close to real-time as possible
Dashboards to support thousands of simultaneous users, for example a freight company might share a a parcel tracker on its website for customers
Others? Please share your thoughts!
1
2
?
![Page 10: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/10.jpg)
Analytics on write is a very immature space – there’s only a handful of tools and frameworks available so far…
PipelineDB• Analytics on write (“continuous views”) using SQL• Implemented as a Postgres fork• Supports Kafka but no sharding yet (I believe)
amazon-kinesis-aggregators• Reads from Kinesis streams and outputs to DynamoDB & CloudWatch• JSON-based query recipes• Written by Ian Meyers here in London
Druid• Hybrid analytics on write, analytics on read• Very rich JSON-based query language• Supports Kafka
![Page 11: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/11.jpg)
… or we can implement a bespoke analytics on write solution – for example with AWS Lambda
• The central idea of AWS Lambda is that developers should be writing functionsnot servers
• With Lambda, we write self-contained functions to process events, and then we publish those functions to Lambda to run
• We don’t worry about developing, deploying or managing servers –instead, Lambda takes care of auto-scaling our functions to meet the incoming event volumes
![Page 12: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/12.jpg)
An AWS Lambda function is stateless and exists only for the side effects that it performs
![Page 13: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/13.jpg)
Designing an analytics on write solution for OOPS
![Page 14: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/14.jpg)
Let’s imagine that we have a global delivery company called OOPS, which has five event types
![Page 15: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/15.jpg)
OOPS management want a near-real time dashboard to tell them two things
Where are each of our delivery trucks now?
How many miles has each of our delivery trucks driven since its last oil change?
1
2
![Page 16: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/16.jpg)
In DynamoDB, we could represent this as a simple table
![Page 17: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/17.jpg)
All we need to do is write an AWS Lambda function to populate this DynamoDB table from our event stream…
![Page 18: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/18.jpg)
… however a more efficient approach is to apply some old-school map-reduce to the micro-batch first
![Page 19: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/19.jpg)
What do we mean when we talk about conditional writes in DynamoDB?
![Page 20: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/20.jpg)
Bonus: DynamoDB’s conditional write syntax is very readabledef conditionalWrite(row: Row) {
val vin = AttrVal.toJavaValue(row.vin)
updateIf(vin, "SET #m = :m",
"attribute_not_exists(#m) OR #m < :m",
Map(":m" -> AttrVal.toJavaValue(row.mileage)),
Map("#m" -> "mileage"))
for (maoc <- row.mileageAtOilChange) {
updateIf(vin, "SET #maoc = :maoc",
"attribute_not_exists(#maoc) OR #maoc < :maoc",
Map(":maoc" -> AttrVal.toJavaValue(maoc)),
Map("#maoc" -> "mileage-at-oil-change"))
}
for ((loc, ts) <- row.locationTs) {
updateIf(vin, "SET #ts = :ts, #lat = :lat, #long = :long",
"attribute_not_exists(#ts) OR #ts < :ts",
Map(":ts" -> AttrVal.toJavaValue(ts.toString),
":lat" -> AttrVal.toJavaValue(loc.latitude),
":long" -> AttrVal.toJavaValue(loc.longitude)),
Map("#ts" -> "location-timestamp", "#lat" -> "latitude",
"#long" -> "longitude"))
}
}
![Page 21: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/21.jpg)
Demo
![Page 22: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/22.jpg)
To simplify the demo, I performed some configuration steps already (1/2)
1. Downloaded the Scala code from https://github.com/alexanderdean/Unified-Log-Processing/tree/master/ch11/11.2/aow-lambda
2. Built a “fatjar” for my Lambda function ($ sbt assembly)
3. Uploaded my fatjar to Amazon S3 ($ aws s3 cp …)
4. Ran a CloudFormation template to setup permissions for my Lambda, available herehttps://ulp-assets.s3.amazonaws.com/ch11/cf/aow-lambda.template
5. Registered my Lambda function with AWS Lambda ($ awslambda create-function …)
![Page 23: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/23.jpg)
To simplify the demo, I performed some configuration steps already (2/2)
6. Created a Kinesis stream ($ aws kinesis create-stream
…)
7. Created a DynamoDB table ($ aws dynamodb create-
table …)
8. Configured the registered Lambda function to use the Kinesis stream as its input ($ aws lambda create-event-
source-mapping --event-source-arn
${stream_arn} --function-name AowLambda --
enabled --batch-size 100 --starting-position
TRIM_HORIZON)
![Page 24: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/24.jpg)
Finally, let’s feed in some OOPS events…
host$ vagrant ssh
guest$ cd /vagrant/ch11/11.1
guest$ ./generate.py
Wrote DriverDeliversPackage with timestamp 2015-01-11 00:49:00
Wrote DriverMissesCustomer with timestamp 2015-01-11 04:07:00
Wrote TruckArrivesEvent with timestamp 2015-01-11 04:56:00
Wrote DriverDeliversPackage with timestamp 2015-01-11 06:16:00
Wrote TruckArrivesEvent with timestamp 2015-01-11 07:35:00
![Page 25: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/25.jpg)
… and check our Kinesis stream, Lambda function and DynamoDB table
![Page 26: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/26.jpg)
Resources and further reading
![Page 27: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/27.jpg)
Further reading
Chapter 11, Analytics on write
Manning Deal of the Day today!
Discount code: dotd110415au (50% off just today)
• https://www.pipelinedb.com/
• https://github.com/awslabs/amazon-kinesis-aggregators/
• http://druid.io/
• https://github.com/snowplow/aws-lambda-nodejs-example-project
• https://github.com/snowplow/aws-lambda-scala-example-project
• https://github.com/snowplow/spark-streaming-example-project
![Page 28: Unified Log London: Analytics on write with AWS Lambda](https://reader033.fdocuments.in/reader033/viewer/2022050614/58720de71a28ab176b8b7e77/html5/thumbnails/28.jpg)
Questions?
http://snowplowanalytics.com
https://github.com/snowplow/snowplow
@snowplowdata
To meet up or chat, @alexcrdean on Twitter or [email protected]