Log Data Analysis Platform

Post on 20-Jul-2015

108 views 3 download

Tags:

Transcript of Log Data Analysis Platform

LOG DATA ANALYSIS PLATFORM

May, 2015

Agenda

1) User-Group Introduction

2) Problematic

3) Log Data Analysis System Overview

4) Task Analysis

5) Solution Architecture

6) Trade-off Analysis

7) Automation

8) Performance Testing

9) Outcome & Plans

PROBLEMATIC

Demo Lab: Why we’ve started this project?

1) Increase Internal Experience

2) Create Reference Solution w/o NDA Limitations

3) Get Playground for Tests

4) Provide Demo Environment for Customers (using their data)

5) Decrease time to Market (by introducing automation)

LOG DATA ANALYSIS PLATFORM :

OVERVIEW

Log Data Analysis Platform Details

Key Facts: • ~270-300 Web Servers

• Log Types: HTTPD Access

logs, Error logs, Application

Server Servlet, OS Service

Logs

• ~500K events per minute

• 150GB of data per day

Technologies:• Flume

• Hadoop/HDFS, MapReduce

• Hive, Impala

• Oozie

• Elasticsearch, Kibana 3

• Tableau Analytics platform

• Puppet + Vagrant

Log Data Examples

Access log:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

Error log:

[Sun Mar 7 20:58:27 2004] [info] [client 64.242.88.10] (104)Connection reset by peer: client

stopped connection before send body completed

[Sun Mar 7 21:16:17 2004] [error] [client 24.70.56.49] File does not exist:

/home/httpd/twiki/view/Main/WebHome

Vmstat

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------

r b swpd free buff cache si so bi bo in cs us sy id wa st

0 0 305416 260688 29160 2356920 2 2 4 1 0 0 6 1 92 2 0

iostat

Linux 2.6.32-100.28.5.el6.x86_64 (dev-db) 07/09/2011

avg-cpu: %user %nice %system %iowait %steal %idle

5.68 0.00 0.52 2.03 0.00 91.76

TASK ANALYSIS

Architecture Drivers: Use Cases

Architecture Drivers: Quality Attributes (1/3)

Architecture Drivers: Quality Attributes (2/3)

Architecture Drivers: Quality Attributes (3/3)

Architecture Drivers: Limitations

Demo Lab: Marketecture

SOLUTION ARCHITECTURE

Solution Architecture

Batch Layer Serving Layer

Speed Layer

Raw Data Storage

Data Strea

m

Real-time Views

Static Views Precomputing

PrecomputingAd-hoc Batch

Views

Static Batch Views

Corporate BI Tool

Legend:

Layer boundary

Data flow (with direction indicated)

Query flow

Apache HTTP Servers

Raw Data Storage Pre-computing Batch Views

Real-Time Views

Dashboard/Search

Data Stream

Real-Time Processing and Aggregations

BI Tool

Avro as a Raw Data Storage file format

Parquet as a Batch Views file format

Star schema as a Batch Views data model

Architecture: Flume Topology

Batch ETL

TRADE-OFF ANALYSIS

Distribution Selection

Hive Stinger vs Impala

Compression Ratio

Access Speed

AUTOMATION

Automation (saves time and money)

80% 20%

Development and Debugging F&P Testing, Demo

Local Development Cloud Development

vagrant up

Automation Process

Phase Tool NotesVM Provisioning Vagrant — Supports:

VirtualBox, VMWare ESX, Amazon AWS

VM Bootstraping Puppet — Installs Cloudera Manager, Cloudera DistributionHadoop, ElasticSearch+Kibana, Flume, Microstrategy, LogGenerator.

— Creates Cluster using Cloudera Manager API.

Configure ETL and BI

Puppet — Configures Flume, Oozie, ElasticSearch, Impala, Hive, Microstrategy Dashboards

Integration Tests Puppet — Generates Workload and ensures data go through.— Checks Logs for errors.— Calculates timing/throughput.

PERFORMANCE TESTING

Log Generator

1 Thread can generate:4200 events / second (File source)5500 events / second (TCP source)

Accurate Sizing

100k/min

50k/min

20k/min

200k/min

Calculator!

OUTCOME & PLANS

Outcome

1) Demo lab, playground, testing platform (in 1 hour)

2) Sizing Calculator

3) Help to get 3 new customers (one is really, really

huge)

4) Strategic Partnership with Cloudera

5) Tons of experience and fun

Plans

1) Add support for other Hadoop Distributions

(Hortonworks, MapR)

2) Make Project Open-Source

Thank You!

31

SoftServe US Office

One Congress Plaza,

111 Congress Avenue, Suite 2700 Austin, TX

78701

Tel: 512.516.8880

Contacts Valentyn Kropov

vkrop@softserveinc.com

Tel: 866.687.3588 x4341