SoftServe's Hadoop Demo Lab

36
SOFTSERVE’S HADOOP DEMO LAB Aug, 2014

description

SoftServe's Hadoop Demo Lab - is a project to aggregate log files from 300 Apache HTTPD web-servers and populate them into Hadoop/ElasticSearch cluster for future analysis using Microstrategy and Kibana. It is even more interesting keeping in mind that all deployment is fully automated using Vagrant and Puppet.

Transcript of SoftServe's Hadoop Demo Lab

Page 1: SoftServe's Hadoop Demo Lab

SOFTSERVE’SHADOOP DEMO LAB

Aug, 2014

Page 2: SoftServe's Hadoop Demo Lab

Agenda

1) Who we are & What we do

2) Why we started this project

3) High-Level Task Overview

4) Task Analysis

5) Solution Architecture

6) Trade-off Analysis

7) Development Aspects

Page 3: SoftServe's Hadoop Demo Lab

WHO WE ARE&

WHAT WE DO

Page 4: SoftServe's Hadoop Demo Lab

4

▪ Leading global Product and Application Development partner founded in 1993

▪ 3,300+ employees across North America, Ukraine and Western Europe

▪ Thousands of successful outsourcing projects!

SaaS/Cloud Solutions . Mobility Solutions . UX/UI BI/Analytics/Big Data . Software Architecture . Security

Clients include:

Page 5: SoftServe's Hadoop Demo Lab

Why SoftServe

• Dedicated Architecture Group (including BI and BigData, 40+ architects)

• Demo Hadoop Environment

• Reference architecture library

• 10+ successful BI/BigData projects

• Certified Big Data engineers (Hadoop, MongoDB)

• Partnership with major RDBMS, BI and BigData vendors

Page 6: SoftServe's Hadoop Demo Lab

What we do: Services

1) Design & Assessment

2) Optimization & Modernization

3) POC & Prototyping

4) Development and Quality Control

5) Production and Non-Production Support

Page 7: SoftServe's Hadoop Demo Lab

Technology Stack

Data Warehouse

Data Integration

Big Data and NoSQL

BI Platforms

Page 8: SoftServe's Hadoop Demo Lab

Big Data Analytics Expertise

Page 9: SoftServe's Hadoop Demo Lab

WHY WE STARTED THIS PROJECT

Page 10: SoftServe's Hadoop Demo Lab
Page 11: SoftServe's Hadoop Demo Lab

Demo Lab: Why?

1) Increase Internal Experience

2) Create Reference Solution w/o NDA Limitations

3) Get Playground for Tests

4) Provide Demo Environment for Customers (using their data)

5) Decrease time to Deliver (by introducing automation)

Page 12: SoftServe's Hadoop Demo Lab

HADOOP DEMO LAB:HIGH-LEVEL TASK

OVERVIEW

Page 13: SoftServe's Hadoop Demo Lab

Demo Lab: Input Data

Data Volume270-300 Web Servers (Apache HTTPD)447 392 events per minute644 245 094 events / day~100-250 bytes per event150GB of data per day

Log Types1) Apache HTTPD access log2) Apache HTTPD error log3) Service log (CPU, RAM, Disk I/O, Disk Space)4) Application server servlet log

RetentionLast 30 days: Raw dataLast 24 hours: per minute aggregationWhole period: per hour aggregation

Page 14: SoftServe's Hadoop Demo Lab

Demo Lab: Log Data Examples

Access log:127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 Error log:[Sun Mar 7 20:58:27 2004] [info] [client 64.242.88.10] (104)Connection reset by peer: client stopped connection before send body completed[Sun Mar 7 21:16:17 2004] [error] [client 24.70.56.49] File does not exist: /home/httpd/twiki/view/Main/WebHome Vmstatprocs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 305416 260688 29160 2356920 2 2 4 1 0 0 6 1 92 2 0 iostatLinux 2.6.32-100.28.5.el6.x86_64 (dev-db) 07/09/2011 avg-cpu: %user %nice %system %iowait %steal %idle 5.68 0.00 0.52 2.03 0.00 91.76

Page 15: SoftServe's Hadoop Demo Lab

TASK ANALYSIS

Page 16: SoftServe's Hadoop Demo Lab

Architecture Drivers: Use Cases

Page 17: SoftServe's Hadoop Demo Lab

Architecture Drivers: Quality Attributes (1/3)

Page 18: SoftServe's Hadoop Demo Lab

Architecture Drivers: Quality Attributes (2/3)

Page 19: SoftServe's Hadoop Demo Lab

Architecture Drivers: Quality Attributes (3/3)

Page 20: SoftServe's Hadoop Demo Lab

Architecture Drivers: Limitations

Page 21: SoftServe's Hadoop Demo Lab

Demo Lab: Marketecture

Page 22: SoftServe's Hadoop Demo Lab

SOLUTION ARCHITECTURE

Page 23: SoftServe's Hadoop Demo Lab

Reference Architecture Selection

Options:

• Traditional Relational• Extended Relational• Non-Relational• Lambda Architecture (Hybrid)• Data Refinery (Hybrid)

Lambda Architecture:

• Simultaneous access to real-time and historical data• Isolated Design and Development• Increased Fault Tolerance

Page 24: SoftServe's Hadoop Demo Lab

Lambda Architecture

Page 25: SoftServe's Hadoop Demo Lab

Data Flow

Page 26: SoftServe's Hadoop Demo Lab

Infrastructure View

Page 27: SoftServe's Hadoop Demo Lab

TRADE-OFF ANALYSIS

Page 28: SoftServe's Hadoop Demo Lab

Distribution Selection

Page 29: SoftServe's Hadoop Demo Lab

Hive Stinger vs Impala

Compression Ratio

Access Speed

Page 30: SoftServe's Hadoop Demo Lab

BI Tool SelectionOptions:• Tableau• JasperSoft• Microstrategy• QlikView

Microstrategy:• Powerful and Feature-Rich BI Tool• 31 days trial period w/o trial key• Well-integrated with Hadoop (and Impala)• Easy to install in a silent-mode (command-line)

Page 31: SoftServe's Hadoop Demo Lab

DEVELOPMENT ASPECTS

Page 32: SoftServe's Hadoop Demo Lab

Automatization

80% 20%

Development and Debugging F&P Testing, Demo

Local Development Cloud Development

Page 33: SoftServe's Hadoop Demo Lab

Log Generator

4200 events / second (File source)5500 events / second (TCP source)

Page 34: SoftServe's Hadoop Demo Lab

Accurate Sizing

100k/min

50k/min

20k/min

200k/min

Calculator!

Page 35: SoftServe's Hadoop Demo Lab

Reference

35Click to add the title

1) Install Hadoop (CDH4) on 5 nodes with VMWare, CDH4, Cloudera Manager 4https://www.youtube.com/watch?v=CobVqNMiqww

2) Puppet & Vagrant Tutorialhttp://puppetlabs.com/blog/puppet-and-vagrant-tutorial 3) Hardware for Hadoophttp://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.0/bk_cluster-planning-guide/content/hardware-selection-for-hbase.html 4) How to Refine and Visualize Server Log Datahttp://hortonworks.com/hadoop-tutorial/how-to-refine-and-visualize-server-log-data/ 5) Hadoop Cluster Sizinghttp://hortonworks.com/wp-content/uploads/downloads/2013/06/Hortonworks.ClusterConfigGuide.1.0.pdf

Page 36: SoftServe's Hadoop Demo Lab

36

Thank You!

SoftServe US OfficeOne Congress Plaza, 111 Congress Avenue, Suite 2700 Austin, TX 78701 Tel: 512.516.8880

Contacts Valentyn [email protected]: 866.687.3588 x4341