Post on 22-May-2015
description
SOFTSERVE’SHADOOP DEMO LAB
Aug, 2014
Agenda
1) Who we are & What we do
2) Why we started this project
3) High-Level Task Overview
4) Task Analysis
5) Solution Architecture
6) Trade-off Analysis
7) Development Aspects
WHO WE ARE&
WHAT WE DO
4
▪ Leading global Product and Application Development partner founded in 1993
▪ 3,300+ employees across North America, Ukraine and Western Europe
▪ Thousands of successful outsourcing projects!
SaaS/Cloud Solutions . Mobility Solutions . UX/UI BI/Analytics/Big Data . Software Architecture . Security
Clients include:
Why SoftServe
• Dedicated Architecture Group (including BI and BigData, 40+ architects)
• Demo Hadoop Environment
• Reference architecture library
• 10+ successful BI/BigData projects
• Certified Big Data engineers (Hadoop, MongoDB)
• Partnership with major RDBMS, BI and BigData vendors
What we do: Services
1) Design & Assessment
2) Optimization & Modernization
3) POC & Prototyping
4) Development and Quality Control
5) Production and Non-Production Support
Technology Stack
Data Warehouse
Data Integration
Big Data and NoSQL
BI Platforms
Big Data Analytics Expertise
WHY WE STARTED THIS PROJECT
Demo Lab: Why?
1) Increase Internal Experience
2) Create Reference Solution w/o NDA Limitations
3) Get Playground for Tests
4) Provide Demo Environment for Customers (using their data)
5) Decrease time to Deliver (by introducing automation)
HADOOP DEMO LAB:HIGH-LEVEL TASK
OVERVIEW
Demo Lab: Input Data
Data Volume270-300 Web Servers (Apache HTTPD)447 392 events per minute644 245 094 events / day~100-250 bytes per event150GB of data per day
Log Types1) Apache HTTPD access log2) Apache HTTPD error log3) Service log (CPU, RAM, Disk I/O, Disk Space)4) Application server servlet log
RetentionLast 30 days: Raw dataLast 24 hours: per minute aggregationWhole period: per hour aggregation
Demo Lab: Log Data Examples
Access log:127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 Error log:[Sun Mar 7 20:58:27 2004] [info] [client 64.242.88.10] (104)Connection reset by peer: client stopped connection before send body completed[Sun Mar 7 21:16:17 2004] [error] [client 24.70.56.49] File does not exist: /home/httpd/twiki/view/Main/WebHome Vmstatprocs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 305416 260688 29160 2356920 2 2 4 1 0 0 6 1 92 2 0 iostatLinux 2.6.32-100.28.5.el6.x86_64 (dev-db) 07/09/2011 avg-cpu: %user %nice %system %iowait %steal %idle 5.68 0.00 0.52 2.03 0.00 91.76
TASK ANALYSIS
Architecture Drivers: Use Cases
Architecture Drivers: Quality Attributes (1/3)
Architecture Drivers: Quality Attributes (2/3)
Architecture Drivers: Quality Attributes (3/3)
Architecture Drivers: Limitations
Demo Lab: Marketecture
SOLUTION ARCHITECTURE
Reference Architecture Selection
Options:
• Traditional Relational• Extended Relational• Non-Relational• Lambda Architecture (Hybrid)• Data Refinery (Hybrid)
Lambda Architecture:
• Simultaneous access to real-time and historical data• Isolated Design and Development• Increased Fault Tolerance
Lambda Architecture
Data Flow
Infrastructure View
TRADE-OFF ANALYSIS
Distribution Selection
Hive Stinger vs Impala
Compression Ratio
Access Speed
BI Tool SelectionOptions:• Tableau• JasperSoft• Microstrategy• QlikView
Microstrategy:• Powerful and Feature-Rich BI Tool• 31 days trial period w/o trial key• Well-integrated with Hadoop (and Impala)• Easy to install in a silent-mode (command-line)
DEVELOPMENT ASPECTS
Automatization
80% 20%
Development and Debugging F&P Testing, Demo
Local Development Cloud Development
Log Generator
4200 events / second (File source)5500 events / second (TCP source)
Accurate Sizing
100k/min
50k/min
20k/min
200k/min
Calculator!
Reference
35Click to add the title
1) Install Hadoop (CDH4) on 5 nodes with VMWare, CDH4, Cloudera Manager 4https://www.youtube.com/watch?v=CobVqNMiqww
2) Puppet & Vagrant Tutorialhttp://puppetlabs.com/blog/puppet-and-vagrant-tutorial 3) Hardware for Hadoophttp://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.0/bk_cluster-planning-guide/content/hardware-selection-for-hbase.html 4) How to Refine and Visualize Server Log Datahttp://hortonworks.com/hadoop-tutorial/how-to-refine-and-visualize-server-log-data/ 5) Hadoop Cluster Sizinghttp://hortonworks.com/wp-content/uploads/downloads/2013/06/Hortonworks.ClusterConfigGuide.1.0.pdf
36
Thank You!
SoftServe US OfficeOne Congress Plaza, 111 Congress Avenue, Suite 2700 Austin, TX 78701 Tel: 512.516.8880
Contacts Valentyn Kropovvkrop@softserveinc.comTel: 866.687.3588 x4341