AWS_Data_Pipeline

16

Click here to load reader

Transcript of AWS_Data_Pipeline

Page 1: AWS_Data_Pipeline

AWS Data Pipeline

~ Ahasan Habib

Technical Project Manager,

Ixora Solutions Ltd.

Dhaka, Bangladesh

Page 2: AWS_Data_Pipeline

What is AWS Data Pipeline?

● Webservice

● Movement & Data transformation

● Data driven workflow

Page 3: AWS_Data_Pipeline

Benefits

● Sequence, Schedule, Run, Manage recurring data processing workloads

reliably.

● Cost effective

● Easy to design ETL

● Support for both structure and unstructure data

● Support on premises and cloud

Page 4: AWS_Data_Pipeline

Data Pipeline Components

● Pipeline Definition

● Pipeline Schedules & run tasks

● Task Runner

Page 5: AWS_Data_Pipeline

Data Pipeline Objects

● ShellCommand Activity

● S3 Data Node

{

"id" : "CreateDirectory",

"type" : "ShellCommandActivity",

"command" : "mkdir new-directory"

}

{

"id" : "OutputData",

"type" : "S3DataNode",

"schedule" : { "ref" : "CopyPeriod" },

"filePath" :

"s3://myBucket/#{@scheduledStartTime}.csv"

}

Page 6: AWS_Data_Pipeline

● EC2 Resource

● Schedule {

"id" : "Hourly",

"type" : "Schedule",

"period" : "1 hours",

"startDateTime" : "2012-09-

01T00:00:00",

"endDateTime" : "2012-10-

01T00:00:00"

}

{

"id" : "MyEC2Resource",

"type" : "Ec2Resource",

"actionOnTaskFailure" : "terminate",

"actionOnResourceFailure" : "retryAll",

"maximumRetries" : "1",

"instanceType" : "m1.medium",

"securityGroups" : [

"test-group",

"default"

],

"keyPair" : "my-key-pair"

}

Page 7: AWS_Data_Pipeline

Work with Other AWS Services

● Amozon Dynamo DB

● Amaxon RDS

● Amazon Redshift

● Amazon S3

● EC2

Page 8: AWS_Data_Pipeline

Accessing Data Pipeline

● Amazon Management Console

● AWS CLI

● AWS SDK

● QUERY API

Page 9: AWS_Data_Pipeline

Create Data Pipeline

● Compose Pipeline Definition objects in a file

● Definition File Structure

{

"id": "S3DataInput",

"type": "S3DataNode",

"schedule": {"ref": "TheSchedule"},

"filePath": "s3://bucket_name",

"myCustomField": "This is a custom value in a custom field.",

"my_customFieldReference": {"ref":"AnotherPipelineComponent"}

}

Page 10: AWS_Data_Pipeline

Step 1

Page 11: AWS_Data_Pipeline

Step 2

Page 12: AWS_Data_Pipeline

Step 3

Page 13: AWS_Data_Pipeline
Page 14: AWS_Data_Pipeline

Notification

● SNS

● Push Delivery

● Pub/sub Model

Page 15: AWS_Data_Pipeline

Q & A

Page 16: AWS_Data_Pipeline

“There's a lot of difference between listening and

hearing.”

~G.K. Chesterton

THANK YOU