HUG Ireland Event - DNM slides

Post on 19-Jan-2017

237 views 1 download

Transcript of HUG Ireland Event - DNM slides

Big Data, AWS and The Data Pipeline

• AWS and Big Data

• Amazon Solutions : − Kinesis

− S3

− EMR

− Redshift

− Data-Pipeline

• DoneDeal Project Overview (Martin Peters)

• Amazon Solutions Applied to DoneDeal (Solution Overview)

• Q&A

Agenda

Why AWS and Big Data

• Agility – Amazon Web Services provides a broad range of services to help you build and

deploy Big Data applications quickly and easily

• Elasticity – AWS gives you fast access to flexible and low cost IT resources, so you can

rapidly scale virtually any big data application including data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing.

• Pay for what you need – With AWS you don’t need to make large upfront investments in time and

money to build and maintain infrastructure. Instead, you can provision exactly the right type and size of resources you need to power your Big Data applications.

• Data Centric Services – Making the process of collecting, uploading, storing, and processing data on

AWS faster, simpler, and increasingly comprehensive.

AWS Data Centric Services

• Data-centric Services :

– Managing Databases is Painful and Difficult

• Amazon Amazon RDS addresses many of the pain points and provides many ease-of-use features.

– SQL Databases do not Work Well at Scale – Amazon DynamoDB provides a fully managed, NoSQL model that has no inherent scalability limits

AWS Data Centric Services Cont’d

• Hadoop is Difficult to Deploy and Manage – Amazon EMR can launch managed Hadoop clusters in minutes.

• Data Warehouses are Costly, Complex, and Slow – Amazon Redshift provides a fast, fully-managed petabyte-scale data warehouse at 1/10th the cost of traditional solutions.

• Streaming Data is Difficult to Capture – Amazon Kinesis facilitates real-time data processing of data streams at terabyte scale

AWS and Big Data Use Cases

• On-Demand Big Data Analytics

• See http://aws.amazon.com/big-data/use-cases/ for more examples: – Clickstream Analysis

– Event-driven Extract, Transform, Load (ETL)

Big Data Challenges

• Ever Increasing

– Volume

– Velocity

– Variety

• Ever Decreasing Latency

– Big Data moving to Real-Time Big Data

• Multiple overlapping tools and platforms

Which Tools ?

Simplify the Model

Applying the Model to Solutions

Quick Sight

Ingest: Stream to Kinesis

• Multiple options e.g.

Ingest And Store: Kinesis, KCL

• Why Stream Storage: – Convert Multiple event streams into fewer persistent sequential streams

(easier to process) – Buffer and De-couple producers and consumers

• Kinesis – Low Latency – High Durability – Managed Service

• Kinesis Connector Library – Transform – Buffer – Filter – Emit

Dynamic Capacity: Auto scaling

• The scaling policies that you define adjust the number of instances, within your minimum and maximum number of instances, based on the criteria that you specify.

Store : Simple Storage Service(S3)

• Secure, Scalable, Reliable

Process : Elastic Map Reduce (EMR)

Process : RedShift

RedShift Cont’d

Orchestrate: DataPipeline

Visualise: Tableau, QuickSight..

Q&A

• Nigel and Martin ..