IBM Big Data Platform 2013-may-14 finalsesam.smart-lab.se/seminarier/sem130514/SS130514IBM.pdf ·...
Transcript of IBM Big Data Platform 2013-may-14 finalsesam.smart-lab.se/seminarier/sem130514/SS130514IBM.pdf ·...
IBM Big Data Platform
Turning big data into smarter decisions
© 2013 IBM CorporationMay 16, 2013
Stefan Söderlund. IBM kundarkitekt, Försvarsmakten
Sesam vår-seminarie “Big Data, Bigga byte kräver Pigga Hertz!”
By 2015, 80% of all available data uncertain
Glo
bal D
ata
Vo
lum
e in
Exab
yte
s
100
90
80
70
Ag
gre
gate
Un
cert
ain
ty %
9000
8000
7000
6000
5000
By 2015 the number of networked devices will be double the entire global population. All sensor data has uncertainty.
The total number of social media accounts exceeds the entire global population. This data is highly uncertain in both its expression and content.
© 2013 IBM Corporation2
Glo
bal D
ata
Vo
lum
e in
Exab
yte
s
Multiple sources: IDC,Cisco
70
60
50
40
30
20
10
Ag
gre
gate
Un
cert
ain
ty %
4000
3000
2000
1000
0
2005 2010 2015
Data quality solutions exist for enterprise data like customer, product, and address data, but this is only a fraction of the total enterprise data.
Smarter Defence
� Ever increasing range of sensors
� Volume, velocity, variety
� Military collectors & open source
� Agility & mobility
� Highly connected systems – blurred
Instrumented
Interconnected
© 2013 IBM Corporation33
� Highly connected systems – blurred edges
� Collaboration across coalitions
� From data to actionable intelligence
� From reactive to proactive
� Whole lifecycle system optimisation
Intelligent
Sustained Information Superiority
Big data is a hot topic because technology makes it possible to analyze ALL available data
Cost effectively manage and analyze all available data,
in its native form – unstructured, structured, streaming
© 2013 IBM Corporation4
ERPCRM RFID
Website
Network Switches
Social Media
Command
control
In order to realize new opportunities, you need to think beyond traditional sources of data
Transactional & Application Data
Machine Data Social Data Enterprise Content
© 2013 IBM Corporation5
• Volume
• Structured
• Throughput
• Velocity
• Semi-structured
• Ingestion
• Variety
• Highly unstructured
• Veracity
• Variety
• Highly unstructured
• Volume
HadoopStreaming
Data
Data Warehouse
Traditional ApproachStructured, analytical, logical
New ApproachCreative, holistic thought, intuition
Web logs, URLsTransaction Data
Analysis expanding from enterprise data to big data, creating new cost-effective opportunities for competitive advantage
© 2013 IBM Corporation6 6
New Sources
UnstructuredExploratory
Iterative
StructuredRepeatable
Linear
TraditionalSources
EnterpriseWide
IntegrationSocial data
Text Data: emails, chats
RFID, sensor data
Network data
Internal App Data
ERP data
Core Business Data
OLTP System Data
The IBM Big Data Platform
� Process any type of data
– Structured, unstructured, in-motion, at-rest
� Built-for-purpose engines
– Designed to handle different
Solutions
IBM Big Data Platform
Analytics and Decision Management
Systems
Management
Application
Development
Visualization
& Discovery
© 2013 IBM Corporation7
requirements
� Analyze data in motion
� Manage and govern data in the ecosystem
� Enterprise data integration
� Grow and evolve on current infrastructure
7
Big Data Infrastructure
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
Solutions
IBM Big Data Platform
Analytics and Decision Management
The IBM Big Data Platform
Delivers deep insight
with advanced in-
database analytics &
operational analytics
� PureData System –
expert integrated
systems to make deep
and operational
© 2013 IBM Corporation8
Big Data Infrastructure
analytics faster &
simpler
� InfoSphere
Warehouse -- data
warehouse software
to access operational
info in real time
Data
Warehouse
Data
Warehouse
Solutions
IBM Big Data Platform
Analytics and Decision Management
The IBM Big Data Platform
Analyze streaming
data and large data
bursts for real-time
insights
© 2013 IBM Corporation9
Big Data Infrastructure
Stream
Computing
Data
Warehouse
� InfoSphere Streams
– software enabling
continuous analysis of
massive volumes of
streaming data with
sub-millisecond
response times
Stream
Computing
Solutions
IBM Big Data Platform
Analytics and Decision Management
The IBM Big Data Platform
Cost-effectively
analyze Petabytes of
unstructured and
structured data
� InfoSphere
BigInsights --
enterprise-grade
© 2013 IBM Corporation10
Big Data Infrastructure
Hadoop
System
Stream
Computing
Data
Warehouse
enterprise-grade
Hadoop system
enhanced with
advanced text
analytics, data
visualization, tools, &
performance features
for analyzing massive
volumes of structured
and unstructured
data.
Hadoop
System
BigInsights Content
Function VersionBasic Edition
Enterprise Edition
Integrated Install Inc Inc
Hadoop (including common utilities, HDFS, MapReduce framework) 1.0.3 Inc Inc
Jaql (programming / query language) 0.5.2 Inc Inc
Pig (programming / query language) 0.10.0 Inc Inc
Flume (data collection/aggregation) 0.9.4 Inc Inc
Hive (data summarization/querying) 0.9.0 Inc Inc
© 2013 IBM Corporation11
Lucene (text search)* 3.3.0 Inc Inc
Zookeeper (process coordination) 3.4.3 Inc Inc
Avro (data serialization) 1.6.3 Inc Inc
HBase (real time read/write) 0.94.0 Inc Inc
HCatalog (table and storage management service) 0.4.0 Inc Inc
Sqoop (RDBMS bulk data transfer) 1.4.1 Inc Inc
Oozie (workflow/ job orchestration) 3.2.0 Inc Inc
Online documentation Inc Inc
Integration with JDBC sources through general-purpose Jaql module Inc Inc
Integration with DB2 (sample functions to submit jobs, read data) Inc Inc
BigInsights Content (cont’d)
FunctionBasic Edition
Enterprise Edition
Integration with R (Jaql module to invoke R statistical capabilities from BigInsights) n/a Inc
Integration with Netezza, DB2 LUW with DPF from Jaql n/a Inc
LDAP authentication, Guardium support, etc. n/a Inc
Integrated Web Console n/a Inc
Business process accelerators (social data, machine data analytics) n/a Inc
Platform performance enhancements (Adaptive MapReduce, large scale n/a Inc
© 2013 IBM Corporation12
Platform performance enhancements (Adaptive MapReduce, large scale indexing, efficient processing of compressed text files, flexible job scheduler, etc.)
n/a Inc
Text analytics n/a Inc
Eclipse tools for text analytic development, Jaql, Hive, Java n/a Inc
Applications for data import/export, Web crawl, machine learning, etc. n/a Inc
Web-based application catalog n/a Inc
Spreadsheet-like analytical tool n/a Inc
IBM support Opt Inc
Streams, Data Explorer, Cognos BI (limited use licenses) n/a Inc
Unlimited storage n/a Inc
Solutions
IBM Big Data Platform
Analytics and Decision Management
The IBM Big Data Platform
Govern data quality
and manage the
information lifecycle
� InfoSphere Information
Server –Cleanses data,
monitors quality and
integrates big data with
existing systems
© 2013 IBM Corporation13
Big Data Infrastructure
13
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
� InfoSphere Optim –
manages business
information throughout its
lifecycle
� InfoSphere Master
Data Management –
manages and maintains
trusted views of master
and reference data
� InfoSphere Guardium
– real-time database
security and monitoring
Information Integration & Governance
Solutions
IBM Big Data Platform
Analytics and Decision Management
The IBM Big Data Platform
Speed time to value
with analytic and
application
accelerators
� Analytic
Accelerators – text
analytics, geospatial,
time-series, data
mining
© 2013 IBM Corporation14
Big Data Infrastructure
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
mining
� Application
Accelerators –
Decence services
financial services,
machine data, social
data, Telco event data
� Industry Models
- comprehensive data
models based on
deep expertise and
industry best practice
Accelerators
Solutions
IBM Big Data Platform
Analytics and Decision Management
Systems
Management
Application
Development
Visualization
& Discovery
The IBM Big Data Platform
Discover,
understand, search,
and navigate
federated sources of
big data
Visualization
& Discovery
© 2013 IBM Corporation15
Big Data Infrastructure
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
� InfoSphere Data
Explorer – Discovery
and navigation
software that provides
real-time access and
fusion of big data with
rich and varied data
from enterprise
applications for
greater insight
The IBM Big Data Platform
� Process any type of data
– Structured, unstructured, in-motion, at-rest
� Built-for-purpose engines
– Designed to handle different
Solutions
IBM Big Data Platform
Analytics and Decision Management
Systems
Management
Application
Development
Visualization
& Discovery
© 2013 IBM Corporation16
requirements
� Analyze data in motion
� Manage and govern data in the ecosystem
� Enterprise data integration
� Grow and evolve on current infrastructure
16
Big Data Infrastructure
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
Warehousing Zone
Enterprise Warehouse
An example of the big data platform in practice
Ingestion and Real-time Analytic Zone
Streams
Co
nn
ec
tors
BI & Reporting
Analytics and Reporting Zone
© 2013 IBM Corporation17
ETL, MDM, Data Governance
Metadata and Governance Zone
17
Warehouse
Data Marts
Co
nn
ec
tors
PredictiveAnalytics
Visualization & Discovery
Landing and Analytics Sandbox Zone
Hive/HBaseCol Stores
Documentsin variety of formats
MapReduce
Hadoop
Applied Research : International Technology Alliance (ITA)
� Strategic Goals:
� Enhance distributed, secure, and flexible decision-making for coalition operations
� Enable the rapid and secure formation of ad hoc teams
� Coalition Focus:
Agile Security/ Agile Security/ Network Network ManagementManagement
Secure Distributed Secure Distributed Information ServicesInformation Services
© 2013 IBM Corporation18 March 2011 18
� Coalition Focus:
� Develop interoperable data acquisition, processing, and management technologies
� Enable hybrid wireless networking among coalition partners
� Embed adaptable security in coalition networks and information services
� Techniques to represent, position, find, and link data/information to coalition decisions
Hybrid Hybrid Wireless Wireless
NetworkingNetworking
Information ServicesInformation Services
Information Representation, Information Representation, Aggregation, and FusionAggregation, and Fusion
System Integration : UK Air Defence (UCCS Project)Project Goal:
� Monitor UK Airspace for terrorist or enemy incursions & initiative intercept
Solution:
� IBM (as prime contractor) implemented state-of-the-art air surveillance and interceptor command & control system
� Developed software applications, integrating multi-radar tracking and voice
© 2013 IBM Corporation19
integrating multi-radar tracking and voice systems and refurbishing entire computer facilities at two RAF bases.
Selected Benefits:
� Reduced Cost (by using Commercial Software)
� Intuitive Human Computer Interface boosts controller performance & reduces training
� New levels of availability & maintainability
Indicative Locations
Senor Fusion – Surveillance / Border ControlBig Data
© 2013 IBM Corporation20
THINK
© 2013 IBM Corporation2121
Get Started on Your Big Data Journey Today
Get Educated
– IBM Big Data: ibm.com/bigdata
– IBMBigDataHub.com
– BigDataUniversity.com
© 2013 IBM Corporation22
– BigDataUniversity.com
– IBV study on big data
– Books / analyst papers
The End
© 2013 IBM Corporation23
The End