Download - SoftServe's Hadoop Demo Lab

Transcript
Page 1: SoftServe's Hadoop Demo Lab

SOFTSERVE’SHADOOP DEMO LAB

Aug, 2014

Page 2: SoftServe's Hadoop Demo Lab

Agenda

1) Who we are & What we do

2) Why we started this project

3) High-Level Task Overview

4) Task Analysis

5) Solution Architecture

6) Trade-off Analysis

7) Development Aspects

Page 3: SoftServe's Hadoop Demo Lab

WHO WE ARE&

WHAT WE DO

Page 4: SoftServe's Hadoop Demo Lab

4

▪ Leading global Product and Application Development partner founded in 1993

▪ 3,300+ employees across North America, Ukraine and Western Europe

▪ Thousands of successful outsourcing projects!

SaaS/Cloud Solutions . Mobility Solutions . UX/UI BI/Analytics/Big Data . Software Architecture . Security

Clients include:

Page 5: SoftServe's Hadoop Demo Lab

Why SoftServe

• Dedicated Architecture Group (including BI and BigData, 40+ architects)

• Demo Hadoop Environment

• Reference architecture library

• 10+ successful BI/BigData projects

• Certified Big Data engineers (Hadoop, MongoDB)

• Partnership with major RDBMS, BI and BigData vendors

Page 6: SoftServe's Hadoop Demo Lab

What we do: Services

1) Design & Assessment

2) Optimization & Modernization

3) POC & Prototyping

4) Development and Quality Control

5) Production and Non-Production Support

Page 7: SoftServe's Hadoop Demo Lab

Technology Stack

Data Warehouse

Data Integration

Big Data and NoSQL

BI Platforms

Page 8: SoftServe's Hadoop Demo Lab

Big Data Analytics Expertise

Page 9: SoftServe's Hadoop Demo Lab

WHY WE STARTED THIS PROJECT

Page 10: SoftServe's Hadoop Demo Lab
Page 11: SoftServe's Hadoop Demo Lab

Demo Lab: Why?

1) Increase Internal Experience

2) Create Reference Solution w/o NDA Limitations

3) Get Playground for Tests

4) Provide Demo Environment for Customers (using their data)

5) Decrease time to Deliver (by introducing automation)

Page 12: SoftServe's Hadoop Demo Lab

HADOOP DEMO LAB:HIGH-LEVEL TASK

OVERVIEW

Page 13: SoftServe's Hadoop Demo Lab

Demo Lab: Input Data

Data Volume270-300 Web Servers (Apache HTTPD)447 392 events per minute644 245 094 events / day~100-250 bytes per event150GB of data per day

Log Types1) Apache HTTPD access log2) Apache HTTPD error log3) Service log (CPU, RAM, Disk I/O, Disk Space)4) Application server servlet log

RetentionLast 30 days: Raw dataLast 24 hours: per minute aggregationWhole period: per hour aggregation

Page 14: SoftServe's Hadoop Demo Lab

Demo Lab: Log Data Examples

Access log:127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 Error log:[Sun Mar 7 20:58:27 2004] [info] [client 64.242.88.10] (104)Connection reset by peer: client stopped connection before send body completed[Sun Mar 7 21:16:17 2004] [error] [client 24.70.56.49] File does not exist: /home/httpd/twiki/view/Main/WebHome Vmstatprocs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 305416 260688 29160 2356920 2 2 4 1 0 0 6 1 92 2 0 iostatLinux 2.6.32-100.28.5.el6.x86_64 (dev-db) 07/09/2011 avg-cpu: %user %nice %system %iowait %steal %idle 5.68 0.00 0.52 2.03 0.00 91.76

Page 15: SoftServe's Hadoop Demo Lab

TASK ANALYSIS

Page 16: SoftServe's Hadoop Demo Lab

Architecture Drivers: Use Cases

Page 17: SoftServe's Hadoop Demo Lab

Architecture Drivers: Quality Attributes (1/3)

Page 18: SoftServe's Hadoop Demo Lab

Architecture Drivers: Quality Attributes (2/3)

Page 19: SoftServe's Hadoop Demo Lab

Architecture Drivers: Quality Attributes (3/3)

Page 20: SoftServe's Hadoop Demo Lab

Architecture Drivers: Limitations

Page 21: SoftServe's Hadoop Demo Lab

Demo Lab: Marketecture

Page 22: SoftServe's Hadoop Demo Lab

SOLUTION ARCHITECTURE

Page 23: SoftServe's Hadoop Demo Lab

Reference Architecture Selection

Options:

• Traditional Relational• Extended Relational• Non-Relational• Lambda Architecture (Hybrid)• Data Refinery (Hybrid)

Lambda Architecture:

• Simultaneous access to real-time and historical data• Isolated Design and Development• Increased Fault Tolerance

Page 24: SoftServe's Hadoop Demo Lab

Lambda Architecture

Page 25: SoftServe's Hadoop Demo Lab

Data Flow

Page 26: SoftServe's Hadoop Demo Lab

Infrastructure View

Page 27: SoftServe's Hadoop Demo Lab

TRADE-OFF ANALYSIS

Page 28: SoftServe's Hadoop Demo Lab

Distribution Selection

Page 29: SoftServe's Hadoop Demo Lab

Hive Stinger vs Impala

Compression Ratio

Access Speed

Page 30: SoftServe's Hadoop Demo Lab

BI Tool SelectionOptions:• Tableau• JasperSoft• Microstrategy• QlikView

Microstrategy:• Powerful and Feature-Rich BI Tool• 31 days trial period w/o trial key• Well-integrated with Hadoop (and Impala)• Easy to install in a silent-mode (command-line)

Page 31: SoftServe's Hadoop Demo Lab

DEVELOPMENT ASPECTS

Page 32: SoftServe's Hadoop Demo Lab

Automatization

80% 20%

Development and Debugging F&P Testing, Demo

Local Development Cloud Development

Page 33: SoftServe's Hadoop Demo Lab

Log Generator

4200 events / second (File source)5500 events / second (TCP source)

Page 34: SoftServe's Hadoop Demo Lab

Accurate Sizing

100k/min

50k/min

20k/min

200k/min

Calculator!

Page 35: SoftServe's Hadoop Demo Lab

Reference

35Click to add the title

1) Install Hadoop (CDH4) on 5 nodes with VMWare, CDH4, Cloudera Manager 4https://www.youtube.com/watch?v=CobVqNMiqww

2) Puppet & Vagrant Tutorialhttp://puppetlabs.com/blog/puppet-and-vagrant-tutorial 3) Hardware for Hadoophttp://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.0/bk_cluster-planning-guide/content/hardware-selection-for-hbase.html 4) How to Refine and Visualize Server Log Datahttp://hortonworks.com/hadoop-tutorial/how-to-refine-and-visualize-server-log-data/ 5) Hadoop Cluster Sizinghttp://hortonworks.com/wp-content/uploads/downloads/2013/06/Hortonworks.ClusterConfigGuide.1.0.pdf

Page 36: SoftServe's Hadoop Demo Lab

36

Thank You!

SoftServe US OfficeOne Congress Plaza, 111 Congress Avenue, Suite 2700 Austin, TX 78701 Tel: 512.516.8880

Contacts Valentyn [email protected]: 866.687.3588 x4341