The journey of Moving from AWS ELK to GCP Data Pipeline
-
Upload
randy-huang -
Category
Engineering
-
view
483 -
download
2
Transcript of The journey of Moving from AWS ELK to GCP Data Pipeline
![Page 1: The journey of Moving from AWS ELK to GCP Data Pipeline](https://reader031.fdocuments.in/reader031/viewer/2022030314/5886f4e51a28abba528b7abf/html5/thumbnails/1.jpg)
Build DMP on top of GCP
VMFive - Randy Huang
![Page 2: The journey of Moving from AWS ELK to GCP Data Pipeline](https://reader031.fdocuments.in/reader031/viewer/2022030314/5886f4e51a28abba528b7abf/html5/thumbnails/2.jpg)
Agenda
• Migrated Pipeline to GCP
• Cost Comparison
• Business Use Case
• Fluentd Demo
![Page 3: The journey of Moving from AWS ELK to GCP Data Pipeline](https://reader031.fdocuments.in/reader031/viewer/2022030314/5886f4e51a28abba528b7abf/html5/thumbnails/3.jpg)
ELK + AWS EMR
Kinesis Lambda
![Page 4: The journey of Moving from AWS ELK to GCP Data Pipeline](https://reader031.fdocuments.in/reader031/viewer/2022030314/5886f4e51a28abba528b7abf/html5/thumbnails/4.jpg)
Pros & Cons• Pros :
• Well Support.
• Well docs.
• Easy to find Reference.
• Cons :
• High Cost.
• Not open source.
• Have to set the scale at first.
![Page 5: The journey of Moving from AWS ELK to GCP Data Pipeline](https://reader031.fdocuments.in/reader031/viewer/2022030314/5886f4e51a28abba528b7abf/html5/thumbnails/5.jpg)
Pipeline on GCP
Dataflow
BigQuery
Machine Learning
Data Visualization
Compute Engine
Global Load Balancing
![Page 6: The journey of Moving from AWS ELK to GCP Data Pipeline](https://reader031.fdocuments.in/reader031/viewer/2022030314/5886f4e51a28abba528b7abf/html5/thumbnails/6.jpg)
Datastudio
![Page 7: The journey of Moving from AWS ELK to GCP Data Pipeline](https://reader031.fdocuments.in/reader031/viewer/2022030314/5886f4e51a28abba528b7abf/html5/thumbnails/7.jpg)
The Products and Services logos may be used to accurately reference Google's technology and tools, for instance in architecture diagrams. 7
Batch
BI Analysis
Storage Cloud Storage
Processing Cloud DataflowStreaming
Time Series Streaming Cloud Pub/Sub
Storage BigQuery
![Page 8: The journey of Moving from AWS ELK to GCP Data Pipeline](https://reader031.fdocuments.in/reader031/viewer/2022030314/5886f4e51a28abba528b7abf/html5/thumbnails/8.jpg)
The Products and Services logos may be used to accurately reference Google's technology and tools, for instance in architecture diagrams. 8
Targeting Engines
Data Sources
Machine Learning Applications
API Backend Compute Engine
Spark MLlib Cloud Dataproc
App Engine
Transform Data
Hosted Models Cloud Machine Learning
Real-Time Prediction API
Device Related Cloud Pub/Sub
Behavior Related Cloud Pub/Sub
3rd Party Data Cloud Pub/Sub
Redis Compute Engine
![Page 9: The journey of Moving from AWS ELK to GCP Data Pipeline](https://reader031.fdocuments.in/reader031/viewer/2022030314/5886f4e51a28abba528b7abf/html5/thumbnails/9.jpg)
Pros & Cons• Pros :
• Cost-effective.
• Operation-effective.
• Google got your back.
• Cons :
• API/SDK changes everyday.
• Some still in beta mode.
• Docs everywhere.
![Page 10: The journey of Moving from AWS ELK to GCP Data Pipeline](https://reader031.fdocuments.in/reader031/viewer/2022030314/5886f4e51a28abba528b7abf/html5/thumbnails/10.jpg)
Workflow Monitoring• Digdag <Airflow/Oozie/Luigi>
• Native support Python & Ruby
• Multi-Cloud
• Modular
• Workflow as code
• Docker Support
• Altering to Slack
![Page 11: The journey of Moving from AWS ELK to GCP Data Pipeline](https://reader031.fdocuments.in/reader031/viewer/2022030314/5886f4e51a28abba528b7abf/html5/thumbnails/11.jpg)
Digdag Sample
![Page 12: The journey of Moving from AWS ELK to GCP Data Pipeline](https://reader031.fdocuments.in/reader031/viewer/2022030314/5886f4e51a28abba528b7abf/html5/thumbnails/12.jpg)
Digdag
![Page 13: The journey of Moving from AWS ELK to GCP Data Pipeline](https://reader031.fdocuments.in/reader031/viewer/2022030314/5886f4e51a28abba528b7abf/html5/thumbnails/13.jpg)
![Page 14: The journey of Moving from AWS ELK to GCP Data Pipeline](https://reader031.fdocuments.in/reader031/viewer/2022030314/5886f4e51a28abba528b7abf/html5/thumbnails/14.jpg)
Cost Comparison
• $2000 on AWS per month
• about $200 on GCP production
• about another $200 for dev
• 50M events per month
![Page 15: The journey of Moving from AWS ELK to GCP Data Pipeline](https://reader031.fdocuments.in/reader031/viewer/2022030314/5886f4e51a28abba528b7abf/html5/thumbnails/15.jpg)
Business Use Case• Digital Ads Targeting
• User Behavior Tagging
• BI
• GEO Reporting
• KPI Reporting
• User Demographic
![Page 16: The journey of Moving from AWS ELK to GCP Data Pipeline](https://reader031.fdocuments.in/reader031/viewer/2022030314/5886f4e51a28abba528b7abf/html5/thumbnails/16.jpg)
Some Tips• BigQuery
• https://status.cloud.google.com/incident/bigquery/18022
• Solved by Fluentd’s Retry and HA
• Dataflow’s SDK & docs is not sync
• Dataflow Sideinput has a bug with Streaming mode
• Compute Engine SLB - TCP/UDP setup for forwarding
![Page 17: The journey of Moving from AWS ELK to GCP Data Pipeline](https://reader031.fdocuments.in/reader031/viewer/2022030314/5886f4e51a28abba528b7abf/html5/thumbnails/17.jpg)
Flunetd Update
• Release note for v0.14
• sub second event flush
• New Plugin APIS support formatting configurations dynamically
(e.g., path /my/dest/${tag}/mydata.%Y-%m-%d.log)
• Secure Forward
![Page 18: The journey of Moving from AWS ELK to GCP Data Pipeline](https://reader031.fdocuments.in/reader031/viewer/2022030314/5886f4e51a28abba528b7abf/html5/thumbnails/18.jpg)
Demo
• Nginx -> Fluentd -> BigQuery -> DataStudio
• MySQL -> Fluentd -> BigQuery