IoT NY - Google Cloud Services for IoT
-
Upload
james-chittenden -
Category
Technology
-
view
679 -
download
0
Transcript of IoT NY - Google Cloud Services for IoT
IoT NY - Cloud services for IoTJames Chittenden Google Cloud Platform Solutions [email protected]
Google confidential │ Do not distribute
Agenda
Big Data the Cloud Way - Why would you ?
Fully Managed: NoOps Ingest, Process & Analyse
Hands On Demo: Building an Event Streaming Pipeline
1
2
3
20-?? BILLION devices will be
connected by 2020
$4-11 TrillionEconomic Impact
54% of top performer companies will invest
more in sensors this yr
Sources: Gartner, PwC, McKinsey
20-?? BILLION devices will be
connected by 2020
$4-11 TrillionEconomic Impact
54% of top performer companies will invest
more in sensors this yr
Sources: Gartner, PwC, McKinsey
Wearables
Watches
Phones
Cars
Home Appliances
Existing Business Owned Equipment
Connected
IoT is a transition to connected
Not Connected
A datacenter is not a collection of computers,a datacenter is a computer.
The same is happening in the Cloud today
State of the art Data Centers.
For the past 17 years, Google has been building out the world’s fastest, most powerful, highest quality cloud
infrastructure on the planet.
2002 2004 2006 2008 2010 2012
Dremel ColossusMapReduce
GFS Bigtable Spanner
2014
Dataflow
Google’s Big Data Innovations go far back Flumejava
BigQuery
Millwheel
Bigtable
Confidential & ProprietaryGoogle Cloud Platform 21
Management
Mobile
Services
Compute
Big Data
Networking
Storage
Developer Tools
Store
Cloud Storage Cloud SQL Cloud
Datastore
Capture Analyze
BigQuery
Process
DataflowCloud Storage
DatastoreCloud SQL
Hadoop/Spark Kafka
Pub/Sub
Hadoop/Spark
Manage the Entire Lifecycle of Big Data
Dataflow
BigQuery
Fast ETLRegexJSONUDFs
Spreadsheets
BI Tools
Coworkers
Applications + Reports PubSub
Cloud Storage
BigTable
Your Data
GCS-Hadoop Connector
Hadoop on Compute Engine Cloud Dataproc
unmanaged managed
Big Data Architecture with Google managed services
Building what’s next 25
Scales automatically
No setup or administration
Stream up to 100,000 rows p/sec
Easily integrates with third-party software
Google BigQuerymakes complex data analysis simple
Question:Find root cause why ad was or was not delivered in the last 30 days.
select date, rejection_reason, count(*)from line_item_table.last30dayswhere line_item_id=56781234
1.2B Rows scanned Result in ~5 seconds!
BigQuery Use @Google: DoubleClick Support
BigQuery scales “Google scale”
Streaming ingest at peak
Largest Data Lake on BigQuery
Largest query by data size
Largest query by rows 10.5 Trillion rows
2.3 Million rows per second
38 Petabytes
2.1 Petabytes
What is BigQuery?
Externalization of Google Dremel
Convenience of SQL
Petabyte-Scale and Fast
Fully Managed, No-Ops Data Warehouse
Building what’s next 29
Merges batch and stream processing
Data processing pipelines
Monitoring interface
Significantly lower cost
Runs on Google or Cloudera Spark (Github)
Google Cloud Dataflowmakes complex data analysis simple
What is Cloud Dataflow?
Cloud Dataflow is a collection of SDKs for
building batch or streaming parallelized
data processing pipelines.
Cloud Dataflow is a fully managed service for executing optimized
parallelized data processing pipelines.
Cloud Pub/Sub
• Globally redundant• Low latency (sub sec.)• Batched read/write• Custom labels• Push & Pull• Auto expiration
Publisher A Publisher B Publisher C
Message 1
Topic A Topic B Topic C
Subscription XA Subscription XB Subscription YC
Subscription ZC
Cloud Pub/Sub
Subscriber X Subscriber Y
Message 2 Message 3
Subscriber Z
Message 1
Message 2
Message 3
Message 3
Dataflow goodies
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid sharding mid-job
1
2
3
4
5
Pipeline p = Pipeline.create();
p.begin()
.apply(TextIO.Read.from(“gs://…”))
.apply(ParDo.of(new ExtractTags())
.apply(Count.create())
.apply(ParDo.of(new ExpandPrefixes())
.apply(Top.largestPerKey(3))
.apply(TextIO.Write.to(“gs://…”));
p.run();
Dataflow goodies
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid sharding mid-job
1
2
3
4
5
Deploy
Schedule & Monitor
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid sharding mid-job
1
2
3
4
5
Dataflow goodies
800 RPS 1200 RPS 5000 RPS 50 RPS
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid sharding mid-job
1
2
3
4
5
Dataflow goodies
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid sharding mid-job
1
2
3
4
5
Dataflow goodies
Pipeline p = Pipeline.create();
p.begin()
.apply(TextIO.Read.from(“gs://…”))
.apply(ParDo.of(new ExtractTags())
.apply(Count.create())
.apply(ParDo.of(new ExpandPrefixes())
.apply(Top.largestPerKey(3))
.apply(TextIO.Write.to(“gs://…”));
p.run();
.apply(PubsubIO.Read.from(“input_topic”))
.apply(Window.<Integer>by(FixedWindows.of(5, MINUTES))
.apply(PubsubIO.Write.to(“output_topic”));
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid sharding mid-job
1
2
3
4
5
Dataflow goodies
Nighttime Mid-Day Nighttime
Demo Time
Pub/Sub
Ingest Process Analyse
Cloud Dataflow BigQuery
Git: https://github.com/james-google/event-streams-dataflow
Demo Time
Pub/Sub
Ingest Process Analyse
Cloud Dataflow BigQuery
Git: https://github.com/james-google/event-streams-dataflow