Post on 01-Jul-2015
description
IBM Big Data & Analytics© 2013 IBM Corporation 1
© 2014 IBM Corporation
Information Management
BigInsights — Technical OverviewOC Big Data Meetup
Lynn Hedegard
Technical Sales Specialist
West Region
15th of October, 2014
© 2014 IBM Corporation 3
Real-Time CRM in the Social World (Meet Lisa)
Telco Customer ProfileRetailer Customer Profile
Lisa registers with
Retailer. Gives
Retailer & Telco
permissions to
“Opt In”
Lisa uses promo code to purchase product from offer AND a few more items that go with the outfit ☺
Lisa “follows” a friend’s post on FB and clicks the “Like” button on an Item she likes
Retailer Fan Page
Intelligent Advisor Platform
Product Catalog
The “Intelligent Advisor” platform processes Lisa’s recent on-line activity and constructs a targeted offer based on recent behavior AND internal marketing strategy
While walking past the store, Lisa receives a promo code for a product we think she might like
Lisa receives a message with an offer reminding her to stop by if she’s in the area
IBM Big Data & Analytics© 2013 IBM Corporation 2
© 2014 IBM Corporation 4
Problem Statement — Complex Environment
• The Local Environment is Complex:
• A single large retail store (1.5 million SKUs)
• Large manufacturing floor (~6 million parts)
• Vegas Casino (20 million card carrying customers)
• The Global Environment is Complex:
• The number of variables affecting business performance is huge.
• US citizens (source: google population)
• 300+ Million total
• (21M+ teenagers) + (40M+ in their 20’s) (that’s a lot of calls & text messages!)
• The interrelationships between these variables is very complex (e.g., N2 problem)
• Multiple customer touch points
• Multiple suppliers & distribution methods
• Market forces (cost of raw goods & services, pricing dynamics, supply/demand)
• Working Premise: Few people in the enterprise can make “good”
Operational Decisions — consistently & quickly
• Few people can “see” all the necessary data.
• Few people can “analyze” all the necessary data.
• Few people understand all the inter-relationships
between business variables.
Businesses can no
longer tolerate
inconsistent Business
Processes
© 2014 IBM Corporation 5
IBM’s Big Data Reference Architecture — High Level
BI and
Reporting
Exploration
Visualization
Functional
App
Industry
App
Predictive
Analytics
Content
Analytics
Analytic Applications
IBM Big Data Platform
Systems
Management
Application
Development
Visualization
& Discovery
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
An Enterprise Eco-System for Big Data
• Integration of all classes of Data Repositories (e.g. DW, Hadoop, & Streaming Data)
• Management
• Enterprise Class Security & Data Governance
• Workload Optimization
• Workload Scheduling
• Dynamic Reconfiguration
• Advanced Analytics
• Complete set of reusable analysis components (i,e., Accelerators)
• Apply analysis to data in its native form (i.e. in the repository)
• Data Exploration of data from myriad repositories using a common interface
• Powerful Visualization Tools
• Eclipse based Development Environments
Big Data Reference Architecture
IBM Big Data & Analytics© 2013 IBM Corporation 3
© 2014 IBM Corporation 6
Application Accelerators Improve Time to Value
Finance AnalyticsStreaming options trading
Insurance and banking DW models
TelecommunicationsCDR streaming analytics
Deep Customer Event Analytics
Social Data AnalyticsSentiment Analytics, Intent to purchase
Machine Data AnalyticsOperational data including logs
for operations efficiency
Text AnalyticsNatural Language Processing
Multi-Language Support
Domain Specific
© 2014 IBM Corporation 7
Analytical Sources
Enhanced Applications
Actionable Insight
Decision Management
Modeling & Predictive Analytics
Discovery & Exploration
Analysis & Reporting
Planning & Forecasting
Content AnalyticsShared Operational Information
Master & Reference
ContentHub
Activity Hub
Metadata Catalog
Customer Experience
Financial Performance
New Business
Model
Risk
Operations& Fraud
IT Economics
Integrated Data
Warehouse
Enterprise Warehouse
Landing Exploration &
Archive
Big DataRepository
Deep Analytics & ModelingAnalytical Appliances
Interactive Analysis & Reporting
Data Marts
Data Integration
Data Quality, Xfrm & Load
Data Sources
TraditionalData Sources
Third-PartyData
Transactional Data
Application Data
NewData Sources
Machine &Sensor Data
Image & Video
EnterpriseContent Data
Social Data
InternetData
Da
ta A
cqu
isit
ion
& A
pp
lica
tio
n A
cce
ss
Streaming Computing
Real-Time Analytical Processing
Security & Business Continuity Management
Event Detection and Action
Platforms
Governance
IBM’s Big Data / Analytics Reference Architecture
IBM Big Data & Analytics© 2013 IBM Corporation 4
© 2014 IBM Corporation 8
Merging the Traditional and Big Data Approaches
IT Group
Structures the data to answer that question
IT Group
Delivers a platform to enable creative discovery
Business Users & Data Scientists
Explore what questions could be asked
Business Users
Determine what question to ask
Monthly sales reports
Profitability analysis
Customer surveys
Brand sentiment
Product strategy
Maximum asset utilization
Big Data ApproachIterative & Exploratory Analysis
Traditional ApproachStructured & Repeatable Analysis
© 2014 IBM Corporation 9
BigInsights
BigInsights
IBM Big Data & Analytics© 2013 IBM Corporation 5
© 2014 IBM Corporation 10
BigInsights: Value Beyond Open Source
OpenSource
Components
Key differentiators
• Built-in text analytics
• Enterprise software integration
• SQL support
• Spreadsheet-style analysis
• Integrated installation of supported open
source and other components
• Web Console for admin and application access
• Platform enrichment: additional security,
performance features, GPFS (alternative file
system), . . .
• World-class support
• Full open source compatibility
Business benefits
• Quicker time-to-value due to IBM technology
and support
• Reduced operational risk
• Enhanced business knowledge with flexible
analytical platform
• Leverages and complements existing software
Visualization & Exploration
Development Tool
Advanced Engines
Connectors
Workload Optimization
Administration & Security
IBM-certifiedApache Hadoop
and related projects
IBM’sValueAdd
© 2014 IBM Corporation 11
BigSheets
• Model “big data” collected from
various sources in spreadsheet-
like structures
• Filter and enrich content with
built-in functions
• Combine data in different
workbooks
• Visualize results through
spreadsheets, charts
• Export data into common formats
(if desired)
No programming knowledge needed!
IBM Big Data & Analytics© 2013 IBM Corporation 6
© 2014 IBM Corporation 12
Social Data Analytics Accelerator
What does it do?
� Provides the ability to analyze large volumes of various types of social media data with real-time processing
Social Data Analytics
Example Application : Movie Campaign Effectiveness• Large Movie Studio wants to understand reaction of movie commercials around events (e.g., SuperBowl)
• Over 30 Million social media consumer profiles built and used in the analysis
• Real-time summary of insights correlated with the airing of the commercial
Why should you care?
� It enables clients to easily obtain insights necessary for:
– Effective/targeted Marketing Campaigns
– Timely product/marketing decisions
– Gaining competitive Intelligence
– Building customer retention and new customer acquisition programs
© 2014 IBM Corporation 13
Big SQL
• Standard SQL syntax and data types
• Joins, unions, aggregates . . .
• VARCHAR, decimal, TIMESTAMP, . . .
• JDBC/ODBC drivers
• Prepared statements
• Cancel support
• Database metadata API support
• Secure socket connections (SSL)
• Optimization
• MapReduce parallelism
or…
• “Local” access for low-latency queries
• Varied storage mechanisms appropriate
for Hadoop ecosystem
• Integration
• Eclipse tools
• DB2, Netezza, Teradata (via LOAD)
• Cognos Business Intelligence
. . .
IBM Big Data & Analytics© 2013 IBM Corporation 7
© 2014 IBM Corporation 14
Big R
R Clients
Scalable Statistics Engine
Data Sources
Embedded R Execution
R Packages
R Packages
1
2
3
1. Explore, visualize, transform, and model big data using familiar R syntax and paradigm
2. Scale out R
• Partitioning of large data (“divide”)
• Parallel cluster execution of pushed down R code (“conquer”)
• All of this from within the R environment (Jaql, Map/Reduce are hidden from you
• Almost any R package can run in this environment
3. Scalable machine learning
• A scalable statistics engine that provides canned algorithms, and an ability to author new ones, all via R
“End-to-end integration of R into IBM BigInsights”
Pull data
(summaries) to
R client
Or, push R
functions
right on the
data
© 2014 IBM Corporation 15
Text Analytics Toolkit
• Mature System: “System T” text analytics engine embedded in IBM products
• Found in Lotus Notes, IBM e-discovery Analyzer, CCI, InfoSphere Warehouse,+++
• Almost a decade since initial release
• Extensible: User can customize Text Analytics Engine
• Toolkit: BigInsights Text Analytic Toolkit provides
• Developer tools
• Easy to use text analytics language
• Set of extractors for fast adoption
• Multilingual support, including support for DBCS languages
• AQL: BigInsights includes Annotator Query Language (AQL): SQL-like!
• Fully declarative text analytics language
• No “black boxes” or modules that can’t be customized.
• Tooling for easy customization because you are abstracted from the programmatic
details
• Competing solutions make use of locked up black-box modules that cannot be
customized, which restricts flexibility and are difficult to optimize for performance
IBM Big Data & Analytics© 2013 IBM Corporation 8
© 2014 IBM Corporation 16
BigInsights Enterprise Edition
Cognos BICognos BI
DataStageDataStage
GuardiumGuardium
DataExplorerDataExplorer
FlumeFlume
RR
StreamsStreams
NetezzaNetezza
DB2DB2
SqoopSqoop
JDBCJDBC
HDFSHDFS
Map ReduceMap Reduce
HiveHive
PigPig
HCatalogHCatalogZoo KeeperZoo Keeper
HbaseHbase
JaqlJaql
OozieOozie
Big SQLBig SQL
GPFS-FPOGPFS-FPO
LuceneLucene
FlexibleSchedulerFlexible
Scheduler
IndexingIndexing
EnhancedSecurity
EnhancedSecurity
AdaptiveMap Reduce
AdaptiveMap Reduce
TextCompression
TextCompression
Integrated Installer
Integrated Installer
MachineLearningMachineLearning
DB ImportDB Import
DB ExportDB Export
DistributedFile Copy
DistributedFile Copy
BoardReaderBoardReader
Web CrawlerWeb Crawler
Accelerator for Social Data
Analysis
Accelerator for Social Data
Analysis
Accelerator for Machine Data
Analysis
Accelerator for Machine Data
Analysis
Text Processing Engine & LibraryText Processing Engine & Library
BigSheetsBigSheets
Deep Analytics
Open SourceOpen Source
IBM Value AddIBM Value Add
Dashboards And Visualizations
Dashboards And Visualizations Data
Integration
System
Mgmt
Analytics
of Data in
Motion
Visualization and Discovery
Deploy Applications
Deploy Applications
MonitorWorkflowMonitor
Workflow
Dynamic Configuration
Dynamic Configuration
File Systems
Parallel Processing
Engines
IBM InfoSphere BigInsights
Infr
as
tru
ctu
re
© 2014 IBM Corporation 17
Web Console
Web
Console
IBM Big Data & Analytics© 2013 IBM Corporation 9
© 2014 IBM Corporation 18
Welcome Tab: Your Starting Point
Tasks: Where and how to begin performing common administrative or analytical tasks
Quick links to common functions
Learn more through external Web resources
© 2014 IBM Corporation 19
Overview of Web Console Capabilities
• Manage BigInsights
• Inspect /monitor system
health
• Add / drop nodes
• Start / stop services
• Launch / monitor jobs
• Explore / modify file system
• Create custom dashboards
• . . .
• Launch applications
• Spreadsheet-like analysis tool
• Pre-built applications (IBM
supplied or user developed)
• Publish applications
• Monitor cluster, applications,
data, etc.
IBM Big Data & Analytics© 2013 IBM Corporation 10
© 2014 IBM Corporation 20
BigInsights Applications Catalog (Web Console)
• Browse available applications
• Manage and deploy applications (administrators only)
• Execute (or schedule execution of ) a deployed application
• Monitor job (application) status
• Link or chain applications for sequential execution
© 2014 IBM Corporation 21
BigSheets
BigSheets
IBM Big Data & Analytics© 2013 IBM Corporation 11
© 2014 IBM Corporation 22
A Browser-Based Analytics Tool For Business Users.
Why BigSheets?
� Business users need an intuitive non-
technical approach for analyzing Big
Data.
� Translating untapped data into
actionable business insights is a
common requirement.
� Visualizing and drilling down into
enterprise and Web data promotes new
business intelligence.
How can BigSheets help?
� Spreadsheet-like interface enables
business users to gather and analyze
data easily.
� Built-in “readers” can work with data in
several common formats (JSON arrays,
CSV, TSV, Web crawler output, . . . )
� Users can combine and explore various
types of data to identify “hidden”
insights.
Why Did IBM Develop BigSheets?
© 2014 IBM Corporation 23
Accessing BigSheets
• Ensure BigInsights Enterprise is running
� Launch the Web console with URL http://<host>:<port> or
http://<host>:<port>/data/html/index.html
• Follow on-screen Task prompt or click on the BigSheets tab
IBM Big Data & Analytics© 2013 IBM Corporation 12
© 2014 IBM Corporation 24
BigSQL
BigSQL
© 2014 IBM Corporation 25
Big SQL
• Standard SQL syntax and data types
• Joins, unions, aggregates . . .
• VARCHAR, decimal, TIMESTAMP, . . .
• JDBC/ODBC drivers
• Prepared statements
• Cancel support
• Database metadata API support
• Secure socket connections (SSL)
• Optimization
• MapReduce parallelism
or…
• “Local” access for low-latency queries
• Varied storage mechanisms appropriate
for Hadoop ecosystem
• Integration
• Eclipse tools
• DB2, Netezza, Teradata (via LOAD)
• Cognos Business Intelligence
. . .
IBM Big Data & Analytics© 2013 IBM Corporation 13
© 2014 IBM Corporation 26
MS Excel: Big SQL integration via ODBC
© 2013 IBM Corporation26
© 2014 IBM Corporation 27
Demo
Demo
IBM Big Data & Analytics© 2013 IBM Corporation 14
© 2014 IBM Corporation 28
Analyst Comments Regarding BigInsights
Analysts
Comments
BigInsights
© 2014 IBM Corporation 29
The Forrester Wave™ - Hadoop Solutions Q1 2014
• Hadoop momentum is unstoppable
• It’s open source roots grow deeply and wildly into the enterprise. Its
refreshingly unique approach is transforming how companies process,
analyze and share big data
• Hadoop vendors face a cut-throat market
• The buying cycle is on the upswing, and Hadoop vendors know it.
Pure-play upstarts must capture market share quickly to make
investors happy; stalwart enterprise vendors need to avoid being
disintermediated; cloud vendors must make solutions cheaper.
• Hadoop is open, but vendors add differentiated features
• Hadoop is an Apache open-source project that anyone can download
for free. Vendors all support, extend and augment Apache Hadoop and
add differentiated features.
IBM Big Data & Analytics© 2013 IBM Corporation 15
© 2014 IBM Corporation 30
� Distributed computing platforms not new to IBM
� Advanced analytic tools
� Global presence
� Deep implementation services
� Complete big data solution
� Compelling roadmap
http://www.forrester.com/pimages/
rws/reprints/document/112461/oid/
1-PBE69P
The Forrester Wave is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave are trademarks of Forrester Research, Inc. The Forrester Wave is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change.
The Forrester Wave™ - Hadoop Solutions Q1 2014
© 2014 IBM Corporation 31
InfoSphere BigInsights 3.0 – Worth a look!
Capability IBM InfoSphere
BigInsights
Cloudera CDH5 HortonWorks HDP
2.1
MAP-R 3.1 Pivotal HD 2.0 Amazon Elastic
MapReduce
Open Source Hadoop Components – PIG, Hive,
HBASE, Oozie, Avro etc ..
Big SQL – Rich, high-performance ANSI compliant
SQL on Hadoop
BigSheets – Spreadsheet style visualization tool for
business users
Text Analytics Accelerator – Simplified development
for text analytics (AQL)
Social Data Accelerator – Developer toolkit for social
media applications
Machine Data Accelerator – Developer toolkit for
building log analytics apps
Adaptive MapReduce– High-performance MR with
recoverable jobs
GPFS-FPO –POSIX, HDFS compatible file system
with enterprise features
IDE – ECLIPSE based integrated development
environment
Big R – full R language integration
Watson Explorer – search and index all data within
BigInsights
IBM Big Data & Analytics© 2013 IBM Corporation 16
© 2014 IBM Corporation 32
BigInsights On-Line Resources
BigInsights
On-Line
Resources
© 2014 IBM Corporation 33
InfoSphere BigInsights 3.0 – QuickStart Edition
� Free, no limit, non-production version of BigInsights
� Big SQL, BigSheets, Text Analytics, Big R, management
console, development tools
� Tutorials and education
� Installable images or VM
• Single or multi-node clusters
• Over 53,000 downloads to date
http://IBM.co/QuickStarthttp://www.ibm.com/developerworks/downloads/im/biginsightsquick/http://www.ibm.com/software/data/infosphere/biginsights/quick-start/
IBM Big Data & Analytics© 2013 IBM Corporation 17
© 2014 IBM Corporation 34
External Hadoop Resource
• IBM.com/Hadoop
• Messaging aimed at Hadoop and open source enthusiasts
• Extensive resources, links to other IBM Big Data sites
External BigInsights Resource
• Developer.IBM.com/Hadoop• Referred to as “Hadoop.dev”• Site and resources tailored to technical
buyers and evaluators
Web Resources
© 2014 IBM Corporation 35
BigSQL Value Add To Hadoop
• SQL on Hadoop without Compromise
• http://public.dhe.ibm.com/common/ssi/ecm/en/sww14019usen/SW
W14019USEN.PDF
• New Big SQL Datasheet – Covers key value propositions &
differentiation + HIVE 0.12 vs. Big SQL 3.0 benchmarks
(20x performance advantage on average)
• Key Big SQL advantages
• Enterprise features
• Compatibility
• Performance
• Federation
IBM Big Data & Analytics© 2013 IBM Corporation 18
© 2014 IBM Corporation 36
IBM BigInsights on Cloud
• Enterprise Hadoop as a Service
Focus on analyzing data using BigInsights features including Big
SQL, BigSheets and text analytics rather than managing
infrastructure
• High performance hardware environment
Hadoop specific reference architecture implemented on dedicated
bare metal nodes
• Auto-provision BigInsights on nodes through a simple web
interface
InfoSphere BigInsights
© 2014 IBM Corporation 37
Thank You