Add A Billion Row Data Warehouse To Your App

21
Add A Billion Row Data Warehouse To Your App with Redshift, sql and duct tape James Crisp, Tech Principal @

description

Add A Billion Row Data Warehouse To Your App. with Redshift, sql and duct tape. James Crisp, Tech Principal @ Getup. Context. GetUp ! is an independent movement to build a progressive Australia and bring participation back into our democracy . - PowerPoint PPT Presentation

Transcript of Add A Billion Row Data Warehouse To Your App

Page 1: Add A Billion Row Data Warehouse To Your App

Add A Billion Row Data Warehouse To Your App

with Redshift, sql and duct tape

James Crisp, Tech Principal @ Getup

Page 2: Add A Billion Row Data Warehouse To Your App

ContextGetUp! is an independent movement to build a progressive Australia and bring participation back into our democracy.

Lots of online campaigns, field actions, social media. Supported by small donations. 600K+ members. Budget, medicare, uni fees, barrier reef, forests etc.

Page 3: Add A Billion Row Data Warehouse To Your App

Big Rails App• CMS + Petitions, emailing MPs,

donations etc• Email blasting & segmenting• Back office & mini-crm• 2 X [3 app, 2 worker, 1 DB servers], Au

and Sg

Page 4: Add A Billion Row Data Warehouse To Your App

Data Warehouse… why??

• Reporting & exports• Experimental data science• Stop locking up transactional DB!!• More data sources (logs, CRM, ..) =>

customer• Different schema & faster queries

Page 5: Add A Billion Row Data Warehouse To Your App

Options• Read-only replica of transactional DB• Mongo / Cassandra / ..• Hadoop, Pig, Hive• Elastic search• BIG Sql, eg Redshift

Page 6: Add A Billion Row Data Warehouse To Your App

Why BIG Sql?• Team skills: tech & data scientists• Easy integration from SQL DB• Good hosted options• Fast performance, column based• Sets and aggregations• Can do JSON for less structured data

Page 7: Add A Billion Row Data Warehouse To Your App

Why Redshift?• Fully hosted & managed multi-node• Fast & Column based, semi-compressed• Relatively cheap and easy to try• Good import options• Massively expandable• (Security & backup options)

Page 8: Add A Billion Row Data Warehouse To Your App

What is Redshift… really?

• Heavily modified fork of PostgreSQL 8

• Specialised data storage & query engine

• Can use normal ODBC/JDBC/Postgres clients to connect

Page 9: Add A Billion Row Data Warehouse To Your App
Page 10: Add A Billion Row Data Warehouse To Your App
Page 11: Add A Billion Row Data Warehouse To Your App

string connString = "Driver={PostgreSQL Unicode};" + String.Format("Server={0};Database={1};" + "UID={2};PWD={3};Port={4};SSL=true;Sslmode=Require", server, DBName, masterUsername, masterUserPassword, port);

OdbcConnection conn = new OdbcConnection(connString);conn.Open();

OdbcDataAdapter da = new OdbcDataAdapter(sql, conn);da.Fill(ds);dt = ds.Tables[0];foreach (DataRow row in dt.Rows){ // Do something useful}

Page 12: Add A Billion Row Data Warehouse To Your App

$DBConnectionString = "Driver={PostgreSQL UNICODE}:Server=$MyServer;Port=$MyPort;Database=$MyDB;Uid=$MyUid;Pwd=$MyPass;"

$DBConn = New-Object System.Data.Odbc.OdbcConnection;$DBConn.ConnectionString = $DBConnectionString;$DBConn.Open();

$DBCmd = $DBConn.CreateCommand();$DBCmd.CommandText = "SELECT * FROM mytable;";$DBCmd.ExecuteReader();$DBConn.Close();

Page 13: Add A Billion Row Data Warehouse To Your App

psql.exe -h $DBSERVER -U $DBUSER -d $DBName -f script.sql

Page 14: Add A Billion Row Data Warehouse To Your App

How much does it cost?• Min cluster size 2, leader is free• 2 X 2TB HDD, 15G RAM: Syd $2.50/h, US

$1.70/h (reserved 1yr Syd $13.2K, US $8.8K)

• Up to 2.56TB flash or 16TB HDD and 244GB RAM per node. Cluster up to 1.6 PB.

Page 15: Add A Billion Row Data Warehouse To Your App

Data Sources so far…

Transactional DBApplication request logs

Page 16: Add A Billion Row Data Warehouse To Your App

Hooking up DB Data

RedshiftMySQL S3 LOADCSVs

Map & Dump CSVs

DB Server

Fire Drop/Create tables, Load data

Page 17: Add A Billion Row Data Warehouse To Your App

Hooking up Request Logs

RedshiftS3

Map & LoadJSON

Upload JSON logs

App ServersFire data load

Page 18: Add A Billion Row Data Warehouse To Your App

Demos

• Scripts & SQL for hooking up• AWS Redshift console• Connect & query

Page 19: Add A Billion Row Data Warehouse To Your App

What have we used it for?

• Faster data science / reporting without locking up transactional DB

• Combining sharded tables• Request logs (browser agent, params, ..)• Marking/tracking segments +

debugging info

Page 20: Add A Billion Row Data Warehouse To Your App

Other uses

• Data exploration with tools like Tableau• BI tools• Denormalised data• Dump in more data for single view of

user – FB info, CRM, etc

Page 21: Add A Billion Row Data Warehouse To Your App

Conclusion

• Easy to set up and use (for Win/.NET too)

• Super fast• Reasonable price• Met our needs well so far

We are hiring atm!