Exploring BigData with Google BigQuery
-
Upload
dharmesh-vaya -
Category
Technology
-
view
700 -
download
2
Transcript of Exploring BigData with Google BigQuery
![Page 1: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/1.jpg)
Dharmesh Vaya @DRVaya
http://drvaya.wordpress.com/
![Page 2: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/2.jpg)
Agenda● What is Big Data ?
● Available Big Data Solutions & Issues
● Why Google BigQuery ?
● Inside BigQuery
● Features & Components
● RESTful API
● Development with BigQuery (Live Demo)○ Query History, Projects, DataSets, Public Datasets, Table Details, Writing
Queries, Save Results.
○ Integration with Applications.
● BigQuery Tools
● Big Data Solution with BigQuery & Google Cloud Platform
● Pricing Model
● Any questions ?
![Page 3: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/3.jpg)
What is Big Data ?
Is it a Data Type ? No
Its a buzzword - massive volume of structured and/or unstructured data.
It is so large that it is difficult to process/analyze using traditional databases.
![Page 4: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/4.jpg)
What is Big Data ?
Data that has following attributes can be ‘Big Data’
![Page 5: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/5.jpg)
So how Big is B - I - G ?
![Page 6: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/6.jpg)
So how Big is B - I - G ?
Library of Congress - Textual Data
20 Terabytes
(20 000 000 000 000 bytes)
![Page 7: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/7.jpg)
So how Big is B - I - G ?
Amazon.com - Inventory &Customer Data
42 Terabytes
(42 000 000 000 000 bytes)
![Page 8: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/8.jpg)
So how Big is B - I - G ?
YouTube.com - Media Data
100+ Terabytes
(100 000 000 000 000 bytes)
![Page 9: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/9.jpg)
So how Big is B - I - G ?
Google.com - Search, Mail, Media & anything you can think of !!
850+ Terabytes
(850 000 000 000 000 bytes)(Speculated Figures)
![Page 10: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/10.jpg)
So how Big is B - I - G ?
World Data Center for Climate - Meteorology Data
6.2 Petabytes
(7 000 000 000 000 000 bytes)
![Page 11: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/11.jpg)
Available Big Data Solutions & Issues
- Highly Scalable and Distributed Computing.- Storage (HDFS) optimized for high throughput
- Security, disabled by default- MapReduce is batch based, hence no real time operations.- Costly to maintain.
- Highly Scalable, talks of handling Petabytes- Elastic set of resources to return result sets - Almost 10x fast as compared to Hadoop.
- High costs of Data Migration and integration- Operations/Maintenance cost may shoot up
![Page 12: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/12.jpg)
Why Google BigQuery ?
Hadoop (with Hive)
AmazonRedshift
Google BigQuery
= 1.4 TB
On an average its within 8-10 seconds !!
![Page 13: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/13.jpg)
Inside Google BigQuery
● BigQuery is based on Dremel, a technology pioneered by Google & extensively used within.
● It used Columnar storage & multi-level execution trees to achieve interactive performance for queries against multi-terabyte datasets.
● BigQuery's performance advantage comes from its parallel processing architecture.
● The query is processed by thousands of servers in a multi-level execution tree structure, with the final results aggregated at the root. BigQuery stores the data in a columnar format so that only data from the columns being queried are real.
● All this & more is now available as a publicly available service for any business or developer to use. This release made it possible for those outside of Google to utilize the power of Dremel for their Big Data processing requirements.
![Page 14: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/14.jpg)
Columnar Storage & Trees
![Page 15: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/15.jpg)
Inside Google BigQuery
There’s a difference
● Dremel is designed as an interactive data analysis tool for large datasets.
● MapReduce is designed as a programming framework to batch process large datasets
Hey you mentioned Dremel,
isn’t Map Reduce based on it ?
![Page 16: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/16.jpg)
Features & Components
Features:● Web GUI for BigQuery● Affordable● Run in Background● Easy Data Importation● Flexible (Addition of Columns, Native Support For Timestamp Type
Of Data)● REST API Support● More than just Standard SQL
Components:● Project● Tables● DataSets● Jobs
![Page 17: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/17.jpg)
RESTful APIMethod HTTP Request
delete DELETE /projects/projectId/datasets/datasetId
get GET /projects/projectId/datasets/datasetId
insert POST /projects/projectId/datasets
list GET /projects/projectId/datasets
patch PATCH /projects/projectId/datasets/datasetId
update PUT /projects/projectId/datasets/datasetId
For Datasets
![Page 18: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/18.jpg)
RESTful API
Method HTTP Request
delete GET /projects/projectId/jobs/jobId
getQueryResults
GET /projects/projectId/queries/jobId
insert POST
https://www.googleapis.com/upload/bigquery/v2/projects/projectId/jobsandPOST /projects/projectId/jobs
list GET /projects/projectId/jobs
query POST /projects/projectId/queries
For Jobs
Similar methods for -
● Projects● Tables● TableData
![Page 19: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/19.jpg)
Demo using Web Interface
![Page 20: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/20.jpg)
Demo : Excel Connector
+
![Page 21: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/21.jpg)
BigQuery ToolsBigQuery Excel Connector bq Command LineBigQuery Browser Tool
Virtualization & BI Tools
ETL Tools
ODBC Connector
![Page 22: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/22.jpg)
Big Data Solution with BigQuery
![Page 23: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/23.jpg)
Big Data Solution with BigQuery
Data Pipeline - transforming and loading data into BigQuery
The process of using the Google Cloud Platform to upload data into BigQuery involves
uploading the CSV files or Javascript Object Notation (JSON) files to Google Cloud Storage before
loading the data into BigQuery. Alternatively, REST API can also be used to provide programmatic
integration into the current computing environment.
Data Visualization - performing data analysis on BigQuery and visualizing the results
A custom, web-based dashboard can be built on Google App Engine using the BigQuery REST
API to execute the queries and using Google Chart Tools to visualize the results
![Page 24: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/24.jpg)
Pricing Model
Action Example
Loading Data Loading files/data into BigQuery
Exporting Data Exporting data, Saving Results from BigQuery
Table Reads Browsing through data
Table Copies Copy existing table to new table
Storage Action Cost
Storage $0.020 per GB, per month.
Streaming Inserts Free until January 1, 2015. After January 1, 2015, $0.01 per 100,000 rows
Query Pricing Cost
On-demand $5 per TB
Reserved Capacity
5GB per second$20k/ month
Wow that’s like 800MB for 1 Rupee, even Internet ain’t that cheap here.
![Page 25: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/25.jpg)
Where to use ?
● Not a replacement to traditional systems, but it compliments the eco-system !!
● Major strength is Handling Large DataSets
● Major usage in Data Analytics
● Important component of Google Cloud Platform
● People are interested in numbers/data and that too quick….
Google BigQuery is the future of Analytics!!
![Page 26: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/26.jpg)
Any questions ?
What we covered ...
✓ What is Big Data ?✓ Available Big Data Solutions & Issues✓ Why Google BigQuery ?✓ Features, Components & Tools✓ RESTful API✓ Demo using Web Interface✓ Big Query Tools✓ Big Data Solution with BigQuery✓ Pricing Model✓ Usage
![Page 27: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/27.jpg)
https://bigquery.cloud.google.comNo registration, just sign-in with your Google account
Follow Dharmesh Vaya on @DRVaya
or subscribe to my http://drvaya.wordpress.com/
You can also add me on +DharmeshVaya
About the presenter
![Page 28: Exploring BigData with Google BigQuery](https://reader033.fdocuments.in/reader033/viewer/2022052301/55a20a7d1a28aba0368b467c/html5/thumbnails/28.jpg)
https://cloud.google.com/developers/articles/getting-started-with-google-bigquery
https://cloud.google.com/files/Redbus.pdf
http://www.reddit.com/r/bigquery/comments/28ialf/173_million_2013_nyc_taxi_rides_shared_on_bigquery/
http://www.datawrangling.com/some-datasets-available-on-the-web/
http://bigqueri.es/
https://developers.google.com/bigquery/pricing#data