Where is my data (in the cloud) tamir dresher

57
Tamir Dresher Senior Software Architect May 19, 2014 Where is my Data? (In the Cloud)

Transcript of Where is my data (in the cloud) tamir dresher

Page 1: Where is my data (in the cloud)   tamir dresher

Tamir Dresher

Senior Software ArchitectMay 19, 2014

Where is my Data? (In the Cloud)

Page 2: Where is my data (in the cloud)   tamir dresher

About Me

• Software architect, consultant and instructor

• Software Engineering Lecturer @ Ruppin Academic Center

• Technology addict

• 10 years of experience

• .NET and Native Windows Programming

@[email protected]://www.TamirDresher.com.

Page 3: Where is my data (in the cloud)   tamir dresher

Agenda

• Storage

• Blob

• Azure SQL Server

• Azure Tables

• HDInsight

Page 4: Where is my data (in the cloud)   tamir dresher

Agenda

• Storage

• Blob

• Azure SQL Server

• Azure Tables

• HDInsight

Page 5: Where is my data (in the cloud)   tamir dresher

Storage

Where is my data Storage

Page 6: Where is my data (in the cloud)   tamir dresher

Storage Prices

6

Page 7: Where is my data (in the cloud)   tamir dresher

Types of information

Where is my data Storage

Page 8: Where is my data (in the cloud)   tamir dresher

North America Europe Asia Pacific

Data centers

Windows Azure Growing Global Presence

Storage SLA – 99.99%52.56 minutes per year

http://azure.microsoft.com/en-us/support/legal/sla

Page 9: Where is my data (in the cloud)   tamir dresher

AZURE BLOBS

9

Page 10: Where is my data (in the cloud)   tamir dresher

What is a BLOB

• BLOB – Binary Large OBject

• Storage for any type of entity such as binary files and text documents

• Distributed File Service (DFS)

– Scalability and High availability

• BLOB file is distributed between multiple server and replicated at least 3 times

Where is my data BLOB

Page 11: Where is my data (in the cloud)   tamir dresher

Blob Storage Concepts

11

Where is my data BLOB

Page 12: Where is my data (in the cloud)   tamir dresher

Blob Operations

REST

Where is my data BLOB

Page 13: Where is my data (in the cloud)   tamir dresher

DEMOCreating a Blob

13

Page 14: Where is my data (in the cloud)   tamir dresher

BLOBS

• Block blob - up to 200 GB in size

• Page blobs – up to 1 TB in size

• Total Account Capacity - 500 TB

• Pricing– Storage capacity used

– Replication option (LRS, GRS, RA-GRS)

– Number of requests

– Data egress

– http://azure.microsoft.com/en-us/pricing/details/storage/

Where is my data BLOB

Page 15: Where is my data (in the cloud)   tamir dresher

SQL AZURE

15

Page 16: Where is my data (in the cloud)   tamir dresher

SQL Azure

• SQL Server in the cloud

• No administrative overheads

• High Availability

• pay-as-you-grow pricing

• Familiar Development Model*

* Despite missing features and some limitations - http://msdn.microsoft.com/en-us/library/ff394115.aspx

Where is my data SQL Azure

Page 17: Where is my data (in the cloud)   tamir dresher

DEMOCreating and Using SQL Azure

17

Page 18: Where is my data (in the cloud)   tamir dresher

SQL Azure – Pricing

Where is my data SQL Azure

Page 19: Where is my data (in the cloud)   tamir dresher

Case Study - https://haveibeenpwned.com/

Where is my data SQL Azure

Page 20: Where is my data (in the cloud)   tamir dresher

Case Study - https://haveibeenpwned.com/

• http://www.troyhunt.com/2013/12/working-with-154-million-records-on.html

• How do I make querying 154 million email addresses as fast as possible?

• if I want 100GB of SQL Server and I want to hit it 10 million times, it’ll cost me $176 a month (now its ~20$)

Where is my data SQL Azure

Page 21: Where is my data (in the cloud)   tamir dresher

AZURE TABLES

21

Page 22: Where is my data (in the cloud)   tamir dresher

Table Storage Concepts

22

Where is my data Tables

Page 23: Where is my data (in the cloud)   tamir dresher

Table Storage

• Not RDBMS – No relationships between entities

– NoSql

• Entity can have up to 255 properties - Up to 1MB per entity

• Mandatory Properties for every entity– PartitionKey & RowKey (only indexed properties)

• Uniquely identifies an entity

• Same RowKey can be used in different PartitionKey

• Defines the sort order

– Timestamp - Optimistic Concurrency

Where is my data Tables

Page 24: Where is my data (in the cloud)   tamir dresher

No Fixed Schema

24

Where is my data Tables

Page 25: Where is my data (in the cloud)   tamir dresher

Table Object Model

• ITableEntity interface –PartitionKey, RowKey, Timestamp, and Etag properties

– Implemented by TableEntity and DynamicTableEntity// This class defines one additional property of integer type,

// since it derives from TableEntity it will be automatically

// serialized and deserialized.

public class SampleEntity : TableEntity

{

public int SampleProperty { get; set; }

}

Where is my data Tables

Page 26: Where is my data (in the cloud)   tamir dresher

Sample – Inserting an Entity into a Table// You will need the following using statements

using Microsoft.WindowsAzure.Storage;

using Microsoft.WindowsAzure.Storage.Table;

// Create the table client.

CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

CloudTable peopleTable = tableClient.GetTableReference("people");

peopleTable.CreateIfNotExists();

// Create a new customer entity.

CustomerEntity customer1 = new CustomerEntity("Harp", "Walter");

customer1.Email = "[email protected]";

customer1.PhoneNumber = "425-555-0101";

// Create an operation to add the new customer to the people table.

TableOperation insertCustomer1 = TableOperation.Insert(customer1);

// Submit the operation to the table service.

peopleTable.Execute(insertCustomer1);

Where is my data Tables

Page 27: Where is my data (in the cloud)   tamir dresher

Retrieve

// Create the table client.

CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

CloudTable peopleTable = tableClient.GetTableReference("people");

// Retrieve the entity with partition key of "Smith" and row key of "Jeff"

TableOperation retrieveJeffSmith =

TableOperation.Retrieve<CustomerEntity>("Smith", "Jeff");

// Retrieve entity

CustomerEntity specificEntity =

(CustomerEntity)peopleTable.Execute(retrieveJeffSmith).Result;

Where is my data Tables

Page 28: Where is my data (in the cloud)   tamir dresher

Table Storage – Important Points

• Azure Tables can store TBs of data

• Tables Operations are fast

• Tables are distributed –PartitionKey defines the partition

– A table might be stored in different partitions on different storage devices.

Where is my data Tables

Page 29: Where is my data (in the cloud)   tamir dresher

Pricing

Where is my data Tables

Page 30: Where is my data (in the cloud)   tamir dresher

Case Study - https://haveibeenpwned.com/

Where is my data Tables

Page 31: Where is my data (in the cloud)   tamir dresher

Case Study - https://haveibeenpwned.com/

• How do I make querying 154 million email addresses as fast as possible?

[email protected] – the domain is the partition key and the alias is the row key

• if I want 100GB of storage and I want to hit it 10 million times, it’ll cost me $8 a month

• SQL Server will cost $176 a month - 22 times more expensive

Where is my data Tables

Page 32: Where is my data (in the cloud)   tamir dresher

HDINSIGHT

32

Page 33: Where is my data (in the cloud)   tamir dresher

Hadoop in the cloud

• Hadoop on Azure Cloud

• Some Facts:

– Bing ingests > 7 petabytes a month

– The Twitter community generates over 1 terabyte of tweets every day

– Cisco predicts that by 2013 annual internet traffic flowing will reach 667 exabytes

Where is my data HDInsight

Sources: The Economist, Feb ‘10; DBMS2; Microsoft Corp

Page 34: Where is my data (in the cloud)   tamir dresher

MapReduce – The BigData Power

• Map – takes input and output key;value pairs

(Key1,Value1)(Key2,Value2)::(Keyn,Valuen)

Where is my data HDInsight

Page 35: Where is my data (in the cloud)   tamir dresher

MapReduce – The BigData Power

• Reduce – take group of values per key and produce new group of values

Key1:[value1-1,Value1-2…]

Key2:[value2-1,Value2-2…]

Keyn:[valueN-1,ValueN-2…]

[new_value1-1,new_value1-2…]

[new_value2-1,new_value2-2…]

[new_valueN-1,new_valueN-2…]

: :

Where is my data HDInsight

Page 36: Where is my data (in the cloud)   tamir dresher

MapReduce - How Does It Work?Where is my data HDInsight

Page 37: Where is my data (in the cloud)   tamir dresher

So How Does It Work?Where is my data HDInsight

Page 38: Where is my data (in the cloud)   tamir dresher

Finding common friends

• Facebook shows you how many common friends you have with someone

• There were 1,310,000,000 active users in facebookwith 130 friends on average (01.01.2014)

• Calculating the mutual friends

Where is my data HDInsight

Page 39: Where is my data (in the cloud)   tamir dresher

Finding common friends

• We can represent Friend Relationship as:

• Note that a Friend relationship is Symmetrical

– if A is a friend of B then B is a friend of A

Where is my data HDInsight

Someone [List of his\her friends]

Common Friends

Page 40: Where is my data (in the cloud)   tamir dresher

Example of Friends file

• U1 -> U2 U3 U4

• U2 -> U1 U3 U4 U5

• U3 -> U1 U2 U4 U5

• U4 -> U1 U2 U3 U5

• U5 -> U2 U3 U4

Where is my data HDInsight Common Friends

Page 41: Where is my data (in the cloud)   tamir dresher

Designing our MapReduce job

• Each line from the file will input line to the Mapper

• The Mapper will output key-value pairs

• Key: (user, friend)

– Sorted, friend might be before user

• value: list of friends

Where is my data HDInsight Common Friends

Page 42: Where is my data (in the cloud)   tamir dresher

Designing our MapReduce job - Mapper

• Each line from the file will input line to the Mapper

• The Mapper will output key-value pairs

• Key: (user, friend)

– Sorted, friend might be before user

• value: list of friends

• Having the key sorted will help us with the reducer, same pairs will be provided together

Where is my data HDInsight Common Friends

Page 43: Where is my data (in the cloud)   tamir dresher

Mapper Example

Where is my data HDInsight Common Friends

Mapper Output:Given the Line:

(U1 U2) U2 U3 U4(U1 U3) U2 U3 U4(U1 U4) U2 U3 U4

U1U2 U3 U4

Page 44: Where is my data (in the cloud)   tamir dresher

Mapper Example

Where is my data HDInsight Common Friends

Mapper Output:Given the Line:

(U1 U2) U2 U3 U4(U1 U3) U2 U3 U4(U1 U4) U2 U3 U4

U1U2 U3 U4

(U1 U2) -> U1 U3 U4 U5(U2 U3) -> U1 U3 U4 U5(U2 U4) -> U1 U3 U4 U5(U2 U5) -> U1 U3 U4 U5

U2 U1 U3 U4 U5

Page 45: Where is my data (in the cloud)   tamir dresher

Mapper Example – final result

Where is my data HDInsight Common Friends

Mapper Output:Given the Line:

(U1 U2) U2 U3 U4(U1 U3) U2 U3 U4(U1 U4) U2 U3 U4

U1U2 U3 U4

(U1 U2) -> U1 U3 U4 U5(U2 U3) -> U1 U3 U4 U5(U2 U4) -> U1 U3 U4 U5(U2 U5) -> U1 U3 U4 U5

U2 U1 U3 U4 U5

(U1 U3) -> U1 U2 U4 U5(U2 U3) -> U1 U2 U4 U5(U3 U4) -> U1 U2 U4 U5(U3 U5) -> U1 U2 U4 U5

U3 -> U1 U2 U4 U5

Mapper Output:Given the Line:

(U1 U4) -> U1 U2 U3 U5(U2 U4) -> U1 U2 U3 U5(U3 U4) -> U1 U2 U3 U5(U4 U5) -> U1 U2 U3 U5

U4 -> U1 U2 U3 U5

(U2 U5) -> U2 U3 U4(U3 U5) -> U2 U3 U4(U4 U5) -> U2 U3 U4

U5 -> U2 U3 U4

Page 46: Where is my data (in the cloud)   tamir dresher

Designing our MapReduce job - Reducer

• The input for the reducer will be structured as:

(friend1, friend2) (friend1 friends) (friend2 friends)

• The reducer will find the intersection between the lists

• Output:

(friend1, friend2) (intersection of friend1 and friend2 friends)

Where is my data HDInsight Common Friends

Page 47: Where is my data (in the cloud)   tamir dresher

Reducer Example

Where is my data HDInsight Common Friends

Reducer Output:Given the Line:

(U1 U2) -> (U3 U4)(U1 U2) -> (U1 U3 U4 U5) (U2 U3 U4)(U1 U3) -> (U2 U4)(U1 U3) -> (U1 U2 U4 U5) (U2 U3 U4)(U1 U4) -> (U2 U3)(U1 U4) -> (U1 U2 U3 U5) (U2 U3 U4)(U2 U3) -> (U1 U4 U5)(U2 U3) -> (U1 U2 U4 U5) (U1 U3 U4 U5)(U2 U4) -> (U1 U3 U5)(U2 U4) -> (U1 U2 U3 U5) (U1 U3 U4 U5)(U2 U5) -> (U3 U4)(U2 U5) -> (U1 U3 U4 U5) (U2 U3 U4)(U3 U4) -> (U1 U2 U5)(U3 U4) -> (U1 U2 U3 U5) (U1 U2 U4 U5)(U3 U5) -> (U2 U4)(U3 U5) -> (U1 U2 U4 U5) (U2 U3 U4)(U4 U5) -> (U2 U3)(U4 U5) -> (U1 U2 U3 U5) (U2 U3 U4)

Page 48: Where is my data (in the cloud)   tamir dresher

Creating c# MapReduce

Where is my data HDInsight Common Friends

Page 49: Where is my data (in the cloud)   tamir dresher

Creating c# MapReduce - Mapper

Where is my data HDInsight Common Friends

public class CommonFriendsMapper:MapperBase{

public override void Map(string inputLine, MapperContext context){

var strings = inputLine.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries);if (strings.Any()){

var currentUser = strings[0];var friends = strings.Skip(1);foreach (var friend in friends){

var keyArr = new[] {currentUser, friend};Array.Sort(keyArr);var key = String.Join(" ", keyArr);context.EmitKeyValue(key, string.Join(" ",friends));

}}

}}

Page 50: Where is my data (in the cloud)   tamir dresher

Creating c# MapReduce - Reduce

Where is my data HDInsight Common Friends

public class CommonFriendsReducer:ReducerCombinerBase{

public override void Reduce(string key,IEnumerable<string> strings,ReducerCombinerContext context)

{var friendsLists = strings

.Select(friendList => friendList.Split(' '))

.ToList();var intersection = friendsLists[0].Intersect(friendsLists[1]);

context.EmitKeyValue(key, string.Join(" ", intersection));}

}

Page 51: Where is my data (in the cloud)   tamir dresher

Creating c# MapReduce – Hadoop Job

Where is my data HDInsight Common Friends

HadoopJobConfiguration myConfig = new HadoopJobConfiguration();myConfig.InputPath = "wasb:///example/data/friends/friends";myConfig.OutputFolder = "wasb:////example/data/friends/output";

Environment.SetEnvironmentVariable("HADOOP_HOME", @"c:\hadoop");Environment.SetEnvironmentVariable("Java_HOME", @"c:\hadoop\jvm");

var hadoop = Hadoop.Connect(clusterUri,clusterUserName,hadoopUserName,clusterPassword,azureStorageAccount,azureStorageKey,azureStorageContainer,createContinerIfNotExist);

var jobResult = hadoop.MapReduceJob.Execute<CommonFriendsMapper, CommonFriendsReducer>(myConfig);

int exitCode = jobResult.Info.ExitCode; // (0 – success, otherwise – failure)

Page 52: Where is my data (in the cloud)   tamir dresher

Pricing

Where is my data HDInsight

10 node cluster that will exist for 24 hours:• Secure Gateway Node - free.• head node - 15.36 USD per 24-hour day• 1 data node - 7.68 USD per 24-hour day• 10 data nodes - 76.80 USD per 24-hour day• Total: $92.16 USD

Page 53: Where is my data (in the cloud)   tamir dresher

WRAP UP

53

Page 54: Where is my data (in the cloud)   tamir dresher

Comparing the alternatives

Storage Type When Should you Use Implications

BLOB Unstructured dataFiles

- Application Logic Responsibility- Consider using HDInsight(Hadoop)

SQL Server Structured Relational DataACID transactionsMax 150GB (500GB in preview)

- SQL DML+DDL- Could affect scalability- BI Abilities- Reporting

Azure Tables Structured DataLoose SchemaGeo Replication (High DR)Auto Sharding

- OData, REST- Application Logic- Responsibility(Multiple Schemas)

Where is my data Wrap Up

Page 55: Where is my data (in the cloud)   tamir dresher

What have we seen

• Azure Blobs

• Azure Tables

• Azure SQL Server

• HDinsight

Where is my data Wrap Up

Page 56: Where is my data (in the cloud)   tamir dresher

What’s Next

• NoSql – MongoDB, Cassandra, CouchDB, RavenDB

• Hadoop ecosystem – Hive, Pig, SQOOP, Mahout

• http://blogs.msdn.com/b/windowsazure/

• http://blogs.msdn.com/b/windowsazurestorage/

• http://blogs.msdn.com/b/bigdatasupport/

Where is my data Wrap Up

Page 57: Where is my data (in the cloud)   tamir dresher

Presenter contact detailsc: +972-52-4772946t: @tamir_dreshere: [email protected]: TamirDresher.comw: www.codevalue.net