Securing the Hadoop Ecosystem ATM (Cloudera) & Shreepadma (Cloudera) Strata/Hadoop World, Oct 2013.
How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop cluster
-
Upload
cloudera-inc -
Category
Technology
-
view
1.269 -
download
0
Transcript of How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop cluster
WEBINARHow CBS Interactive Uses Cloudera Manager to Effectively Manage their Hadoop Cluster
Wednesday, September 19th, 2012
Manoj Murumkar - Senior Manager, Data Engineering, CBS Interactive
Bala Venkatrao – Director of Products, Cloudera
Agenda
Introductions
CBSi • Hadoop Use Case • Operational Challenges• How Cloudera Manager helps CBSi & Demo • Benefits of using Cloudera Manager
Cloudera Manager• Overview & Benefits • Key Features • Roadmap
Q&A
2 ©2012 Cloudera, Inc. All Rights Reserved.
Introductions
Manoj MurumkarSenior Manager, Data Engineering at CBS Interactive
Manoj has been working with data technologies since 1998. His team currently responsible for providing data infrastructure solutions and operating them for internet division of CBS corporation. He has been involved with Hadoop for more around 3 years, around 2 years of which working with Cloudera. His team has built big data infrastructure from ground up that helps in clickstream analysis using Hadoop streaming.
Bala VenkatraoDirector, Products at Cloudera
Bala Venkatrao is part of the product management team at Cloudera and leads the efforts around Cloudera Manager. In addition, he is involved in several other initiatives, including customer advocacy, partnership development, marketing etc.
3 ©2012 Cloudera, Inc. All Rights Reserved.
Building web analytics for Top 10 global web property on Hadoop 235M worldwide monthly unique users
©2012 Cloudera, Inc. All Rights Reserved.
Requires advanced analytics on click stream data in near real time
Weblog processing time on proprietary platform hit limit while data volumes continuously increased
Ability/Cost to store historical data for analyses
Web analytics platform on Hadoop processes >1B global events/day
>1PB on Hadoop; 42 nodes
Tracking clicks, page views, downloads, streaming video events, ad events, etc.
Hadoop Components: HDFS, Hive, MapReduce, Pig, Hadoop Streaming
Optimizing what content is placed beside that which user is currently reading
Reduced processing time by 6+ hrs. to reach SLA
Accommodates 50% data volume increase per year
Reduce cost of storing/processing data
Greater ad revenues achieved
Challenge Solution Results
Source: Hadoop World 2012 presentation. Michael Sun, Lead Software Engineer & Manager of DW Operations, CBS Interactive. http://www.cloudera.com/resource/hadoop-world-2012-presentation-slides-building-web-analytics-processing-on-hadoop-at-cbs-interactive/
4
CBSi Hadoop
Operational Challenges
Prior to Cloudera Manager
Lack of… Holistic view Configuration control
No audit trail/history of changes
Existing solutions were… Ganglia , Hadoop web UI pages and custom scripts
Difficult to maintain
No visibility into activity failures• Reactive to user complaints on failed/long running jobs
5 ©2012 Cloudera, Inc. All Rights Reserved.
How Cloudera Manager helps CBSi with
Hadoop Operations
6
Intuitive visual interface Can manage and monitor the whole cluster
Overall health status/dashboard Ability to drill down from services > roles > hosts
Service Monitoring and Alerting Makes Hadoop operations pro-active
Heatmaps provides an easy way to identify outliers
Activity Monitoring• Helps identify failed or slow running jobs
Notify end-users on failed jobs and manage SLA’s
Workflows Simple to add new ‘data nodes’, hosts, clients etc.
©2012 Cloudera, Inc. All Rights Reserved.
Key Benefits of
Using Cloudera Manager at CBSi
Lowers the barrier for Hadoop administration Do not need to rely on experts solely
Makes life easier – saves money & time Avoid licensing costs associated with managing multiple tools Cuts technical and human resource costs Reduces time to manage and maintain the cluster
Provides a “one-stop” holistic view Easy to understand how the overall cluster is performing
Helps create repeatable processes & workflows for Hadoop operations
Improves efficiency of the Operations team
8 ©2012 Cloudera, Inc. All Rights Reserved.
Why You Need Cloudera Manager
10
COMPLEXITYHADOOP IS MORE THAN A DOZEN SERVICES RUNNING ACROSS MANY MACHINES
HUNDREDS OF HARDWARE COMPONENTS THOUSANDS OF SETTINGS LIMITLESS PERMUTATIONS
1
CONTEXTHADOOP IS A SYSTEM, NOT JUST ACOLLECTION OF PARTS
EVERYTHING IS INTERRELATED RAW DATA ABOUT INDIVIDUAL PIECES IS NOT ENOUGH MUST EXTRACT WHAT’S IMPORTANT
2
EFFICIENCYMANAGING HADOOP WITH MULTIPLE TOOLS & MANUAL PROCESSES TAKES LONGER
COMPLICATED, ERROR PRONE WORKFLOWS LONGER ISSUE RESOLUTION LACK OF CONSISTENT AND REPEATABLE PROCESSES
3
©2012 Cloudera, Inc. All Rights Reserved.
Cloudera Manager Provides
End-to-End CDH Administration
11
DEPLOYINSTALL, CONFIGURE AND START YOUR CLUSTER IN 3 SIMPLE STEPS
1
CONFIGURE & OPTIMIZEENSURE OPTIMAL SETTINGS FOR ALL HOSTS AND SERVICES
2
MONITOR, DIAGNOSE & REPORTFIND AND FIX PROBLEMS QUICKLY, VIEW CURRENT AND HISTORICAL ACTIVITY AND RESOURCE USAGE
3 CDH
©2012 Cloudera, Inc. All Rights Reserved.
Managing Complexity
One Tool For Everything
©2012 Cloudera, Inc. All Rights Reserved.12
DEPLOYMENT & CONFIGURATION MONITORING WORKFLOWS EVENTS & ALERTS LOG SEARCH DIAGNOSTICS REPORTING ACTIVITY
MONITORING
CLOUDERA ENTERPRISE
+
DO-IT-YOURSELF
“In a recent Cloudera survey, >95% of respondents emphasized the need for a single end-to-end tool to manage their Hadoop Operations”
Providing Context
Raw Data vs. Hadoop Intelligence
13
WORKFLOWSENSURES THAT MULTI-STEP TASKS ARE ACCOMPLISHED COMPLETELY & IN THE CORRECT SEQUENCE
2
SMART CONFIGURATIONAUTO-SETS CONFIGURATIONS & GUARDS AGAINST USER ERROR
1
EVENTS & ALERTSMAKES YOU AWARE OF WHAT’S IMPORTANT AT A HADOOP SYSTEM LEVEL
4
DEPENDENCIESAWARE OF HOW A PARTICULAR ACTION AFFECTS THE REST OF THE CLUSTER & MANAGES THE IMPACT
3
HISTORYCOMPARES CURRENT & PAST ACTIVITIES FOR CONTEXT5
? VS.
©2012 Cloudera, Inc. All Rights Reserved.
Cloudera Manager Key FeaturesAutomated Deployment Installs the complete Hadoop stack in minutes via a wizard-based interface
Centralized Management Gives you complete, end-to-end visibility and control over your Hadoop cluster from a single interface
Multi-Cluster Management Allows you to manage multiple clusters from a single instance of Cloudera Manager
LDAP Authentication Integrate Cloudera Manager with Active Directory
Global Time Control Establishes the time context globally for almost all views
Correlates jobs, activities, logs, system changes, configuration changes and service metrics along a single timeline to simplify diagnosis
Service & Configuration Management
Set server roles, configure services and manage security across the cluster
Gracefully start, stop and restart of services as needed
Role-Based Administration Supports Administrator and Read-Only users
Audit Trails Maintains a complete record of configuration changes with the ability to roll back to previous states
Proactive Health Checks Monitors dozens of service performance metrics and alerts you when you approach critical thresholds
14 ©2012 Cloudera, Inc. All Rights Reserved.
Cloudera Manager Key Features
Intelligent Log Management Gather, view and search Hadoop logs collected from across the cluster
Scans Hadoop logs for irregularities and warns you before they impact the cluster
Event Management Creates and aggregates relevant Hadoop events pertaining to system health, log messages, user services and activities and make them available for alerting and searching
Alerting Generates email alerts when certain events occur
Activity Monitoring Consolidates all cluster activity into a single, real-time view
Host Level Monitoring View information pertaining to hosts in your cluster including status, resident memory, virtual memory and roles
Heatmaps Visualize health status and metrics across the cluster to quickly identify problem nodes and take action
Operational Reports Visualize current and historical disk usage by user, group and directoryTrack MapReduce activity on the cluster by job or user
Support Integration Takes a snapshot of the cluster state and automatically sends it to Cloudera support to assist with resolution
Comprehensive API Easily integrate Cloudera Manager with your existing enterprise-wide management and monitoring tools
15 ©2012 Cloudera, Inc. All Rights Reserved.
©2012 Cloudera, Inc. All Rights Reserved.
16
Cloudera Manager Roadmap
Maintenance mode
Platform Support Manage additional services like Flume, Hive etc.
Monitoring ZooKeeper monitoring Advanced HBase monitoring
Rolling Upgrades
Usability enhancements Improved error handling Log search enhancements Enhanced charting
17
Why Enterprises are Standardizing on Cloudera Manager
SIMPLEEND-TO-END HADOOP ADMINISTRATION IN A SINGLE TOOL1INTELLIGENTMANAGES HADOOP AT THE SYSTEM LEVEL - CLOUDERA’S EXPERIENCE REALIZED IN SOFTWARE
2EFFICIENTSIMPLIFIES COMPLEX WORKFLOWS & MAKES ADMINISTRATORS MORE EFFICIENT
3BEST-IN-CLASSTHE ONLY ENTERPRISE-GRADE HADOOP MANAGEMENT APPLICATION AVAILABLE
4
©2012 Cloudera, Inc. All Rights Reserved.
Next Steps
• Try out FREE edition of Cloudera Manager• Download from:
http://www.cloudera.com/products-services/tools/• Support available via [email protected]• For Cloudera Enterprise subscriptions, please
contact: [email protected]
©2012 Cloudera, Inc. All Rights Reserved.18