© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP Software - Big Data ChallengesFebruary 2015
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.2
The world has changed…
YouTube
Viber
Qzone
Amazon Web Services
GoGrid
Rackspace
LimeLight
Jive Software
salesforce.com
Xactly
Paint.NET
Business
Education
Entertainment
Games
Lifestyle
Music
Navigation
News
Photo & Video
Productivity
Reference
Social Networking
Sport
Travel
Utilities
Workbrain
SuccessFactors
Taleo
Workday
Finance
box.net
TripIt
Zynga
Zynga
Baidu
Yammer
Atlassian
Atlassian
MobilieIronSmugMug
SmugMug
Atlassian
Amazon
AmazoniHandy
PingMe
PingMe
Associatedcontent
Flickr
Snapfish
Answers.com
Tumblr.
Urban
Scribd.Pandora
MobileFrame.com
Mixi
CYworld
Renren
Yandex
Yandex
Heroku
RightScale
New Relic
AppFog
Bromium
Splunk
CloudSigma
cloudability
kaggle
nebula
Parse
ScaleXtreme
SolidFire
Zillabyte
dotCloud
BeyondCore
Mozy
Fring Toggl
MailChimp
Hootsuite
Foursquare
buzzd
Dragon Diction
SuperCam
UPS Mobile
Fed Ex Mobile
Scanner Pro
DocuSign
HP ePrint
iSchedule
Khan Academy
BrainPOP
myHomework
Cookie Doodle
Ah! Fasion Girl
PaperHost
SLI Systems
NetSuite
OpSource
Joyent
Hosting.com
Tata Communications
Datapipe
PPM
Alterian
Hyland
NetDocuments
NetReach
OpenText
Xerox
Microsoft
IntraLinks
Qvidian
Sage
SugarCRM
Volusion
Zoho
Adobe
Avid
Corel
Microsoft
Serif
Yahoo
CyberShift
Saba
Softscape
Sonar6
Ariba
Yahoo!
Quadrem
Elemica
Kinaxis
CCC
DCC
SCMADP VirtualEdge
Cornerstone onDemand
CyberShift
KenexaSaba
Softscape
Sonar6
Workscape
Exact Online
FinancialForce.com
IntacctNetSuite
Plex Systems
Quickbooks
eBay
MRM
Claim Processing
Payroll
Sales tracking & Marketing
CommissionsDatabase
ERP
CRM
SCM
HCM
HCM
PLM
HP
EMC
Cost Management
Order Entry
Product Configurator
Bills of MaterialEngineering
Inventory
Manufacturing Projects
Quality Control
SAP
Cash Management
Accounts ReceivableFixed AssetsCosting
Billing
Time and Expense
Activity ManagementTraining
Time & Attendance
Rostering
Service
Data Warehousing
The InternetGigabytes
Client/serverMegabytes
Every 60 seconds…
IBM
Unisys
Burroughs
Hitachi
NECBull
Fijitsu
Mainframe Kilobytes
Big Data, Cloud, Mobility
Zettabytes
Brontobytes + Geopbytes
2,000 check-ins on Four Square
$275,000 spent online shopping
204 million+ emails sent
48 hours new video on YouTube
38,000 new Tumblr blog posts
100,000+ tweets
2 million+ Google searches
35,000 brand “Likes” on Facebook
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3
We have gone beyond the decimal system
Big Data from the “Internet of Things”
Today, data scientists use
Yottabytes to
describe how much government data the NSA or FBI have on people altogether.
In the near future, a Geopbyte will be the measurement to describethe type of data generated from the IOT.
1030This will take us beyond our decimal system
Geopbyte
This will be our digital universe tomorrow…
Brontobyte 1027
1024
This is our digital universe today Yottabyte
1021
1.3 ZB of network traffic by 2016
Zettabyte10
18
1 EB of data is created on the internet each dayExabyte
1012
Terabyte500TB of new data per day are ingested in Facebook databases
1015PetabyteThe CERN Large Hadron Collider generates 1PB per second
109
Gigabyte10
6
Megabyte
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4
Enterprise data growthCosts of managing data
1,820 TB of data created
Every 60 seconds…
YouTube
Viber
Qzone
Amazon Web Services
GoGrid
Rackspace
LimeLight
Jive Software
salesforce.com
Xactly
Paint.NET
Business
Education
Entertainment
Games
Lifestyle
Music
Navigation
News
Photo & Video
Productivity
Reference
Social networking
Sport
Travel
Utilities
Workbrain
SuccessFactors
Taleo
Workday
Finance
box.net
TripIt
Zynga
Zynga
Baidu
Yammer
Atlassian
Atlassian
MobilieIronSmugMug
SmugMug
Atlassian
Amazon
AmazoniHandy
PingMe
PingMe
Associatedcontent
Flickr
Snapfish
Answers.com
Tumblr.
Urban
Scribd.Pandora
MobileFrame.com
Mixi
CYworld
Renren
Yandex
Yandex
Heroku
RightScale
New Relic
AppFog
BromiumSplunk
CloudSigma
cloudability
kaggle
nebula
Parse
ScaleXtreme
SolidFire
Zillabyte
dotCloud
BeyondCore
Mozy
Fring Toggl
MailChimp
Hootsuite
Foursquare
buzzd
Dragon Diction
SuperCam
UPS Mobile
Fed Ex Mobile
Scanner Pro
DocuSign
HP ePrint
iSchedule
Khan Academy
BrainPOP
myHomework
Cookie Doodle
Ah! Fasion Girl
PaperHost
SLI Systems
NetSuite
OpSource
Joyent
Hosting.com
Tata Communications
Datapipe
PPM
Alterian
Hyland
NetDocuments
NetReach
OpenText
Xerox
Microsoft
IntraLinks
Qvidian
Sage
SugarCRM
Volusion
Zoho
Adobe
Avid
Corel
Microsoft
Serif
Yahoo
CyberShift
Saba
Softscape
Sonar6
Ariba
Yahoo!
Quadrem
Elemica
Kinaxis
CCC
DCC
SCMADP VirtualEdge
Cornerstone onDemand
CyberShift
KenexaSaba
Softscape
Sonar6
Workscape
Exact Online
FinancialForce.com
IntacctNetSuite
Plex Systems
Quickbooks
eBay
MRM
Claim processing
Payroll
Sales tracking & marketing
CommissionsDatabase
ERP
CRM
SCM
HCM
HCM
PLM
HP
EMC
Cost management
Order entry
Product configurator
Bills of materialEngineering
Inventory
Manufacturing projects
Quality control
SAP
Cash management
Accounts receivableFixed assetsCosting
Billing
Time and Expense
Activity managementTraining
Time & attendance
Rostering
Service
Data warehousing
The InternetGigabytes
Client/serverMegabytes
IBM
Unisys
Burroughs
Hitachi
NECBull
Fijitsu
Mainframe Kilobytes
Mobile, social, Big Data & the cloud
Zettabytes TCO for unstructured data varies between $4/GB to $100/GB annually, but $25GB is a good rule of thumb*
*Source: ESG White Paper – The Cost of Managing Unstructured Data, May 2014
The volume, velocity and breadth of channels often overwhelms Information Management strategies leading to dark data
Storage costs are visible, soft costs such as opportunity & risk costs are less so, but no less real…
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
What is legacy data and dark data?Redundant, obsolete, trivial, and the unknown…
Legacy data resides in:Legacy applications and repositoriesUnmanaged SharePoint sites, file shares and mail systems
Legacy data can contain or be: RedundantDuplicates and unauthorized copies Obsolete
No longer in use or out of dateDetermined through creation,
last modified or accessed date and retention policy
TrivialFile type with no content value
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
What is dark data?What lies hidden in your enterprise data… the unknown
Beyond legacy data…
Dark data tends to be:• Human readable
• Unstructured
• Unindexed
• Unmanaged
• Inactive
• Orphaned
Dark data resides in:• File servers
• SharePoint
• Email servers
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
The risk of ignoring legacy and dark data
Legacy & dark data sitting outside the information governance strategy exposes the organization to risk:
•Spiralling costs– Expanding information footprint and storage costs
– Litigation and eDiscovery costs (“smoking gun” or inability to deliver)
•Security breaches and reputational damage– Sensitive information unprotected (personally identifiable information, privacy regulations)
– Data leakage and misuse
•Poor business execution and performance– Incorrect context
– Decisions based on outdated information
– Duplicate effort spent re-creating information
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8
Today’s reality
86% of corporations cannot deliver the right information at right time*
3%23%
% of data that would be
potentially useful if effectively engaged
actually being tagged for Big Data value
% of the digital universe that is
actually being tagged, analyzed and leveraged
0.5%
*Source: IDC Predictions 2012: Competing for 2020 & ¹Source: IDC The Digital Universe in 2020, December 2012
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9
Insight from 100% of the data
Data is exploding but traditional data technologies impose limits - We need connected intelligence
Structureddata
Humaninformation
Machine data
Connected Intelligence
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
…leaving a trail of digital footprints.© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11
Engage 100% of data to gain competitive advantage
Data volumes
Acc
ura
cy a
nd
insi
gh
t
CRM ERP Data warehouse Web Social Log files Machine data Semi-structured
Dark data
Big DataTraditional
enterprise data
Unstructured
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12
It takes a Big Data platform to “cash in” on all your data assets
• Only .5% of data in the average organization is tagged and analyzed
• Information silos - everywhere
• Tools for finding and understanding information, tied to original application and format
• Queries take too long and are too rigid,difficult to uncover opportunities, emerging patterns & unexpected threats
Siloed data challenge• Ad hoc discovery - find what’s in the
data without pre-structuring it
• Ubiquitous but secure data access
• Real time data collection and analysis, any format, any data source
• An extensible platform to harness100% of data, on-premise, in the cloud
Big Data platform needs
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13
HP HavenBig Data platform
Gain insight from 100% of your data
• Analyze machine, business, human data
• Connect to any existing data source system
• Scale 50-1000x faster than legacy systems
• Develop modern data-driven applications & web services
HP applications
Customer applications
Developer applications
Haven
Defined programming interfaces
Analytics, context and categorization
Data connectors
Social media
IT/OT ImagesAudioVideo Mobile Search engine
Email Texts DocumentsTransactional data
Records Compliancearchives
Scalable data stores
On-premise
In the Cloud
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14
Use case # 1: Smart / Safe City
• Deployment Environment -
• Ingest data from 2,000+ CCTV cameras in Auckland
• View network of road and environmental sensors
• Social media trending, broadcast monitoring, and real time web news
• Phase 1 – scene analysis and license plate recognition
• Future Phase - Integrate HP Vertica to uncover breaking trends and facilitate incident responses
• HP IDOL eduction sends interesting data to Vertica for statistical analysis and slice/dice
• Combine HP Vertica’s pattern-matching and graph-analysis at scale with HP IDOL’s ability to model concepts and enrich data
This is a rolling (up to 3 year) roadmap and is subject to change without notice
Improving public safety by detecting high-risk activities and investigating threats
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15
Use case # 2: Catch Insider Traders
• Multiple data sources:
• HP Digital Safe data
• Transactional trading data
• Financial news feeds
• Social media
• Email, voicemail recordings, instant messaging
• Phase 1 – complex policies such as highlighting suspect trades where no communication can be found between related Bank A and Bank B contacts
• Future Phase - Integrate HP Vertica for trend and anomaly detection
• HP IDOL eduction sends interesting data to HP Vertica for statistical analysis and slice/dice
• Combine HP Vertica’s pattern-matching and graph-analysis at scale with HP IDOL’s ability to model concepts and enrich data
This is a rolling (up to 3 year) roadmap and is subject to change without notice
Financial Services - Information Surveillance & Digital Forensics Solution
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16
Use case # 3: Smart Retail / Voice of Customer
• Multiple data sources:
• Enterprise – documents, email, ticketing systems, CRM cases, videos
• Customer – social media, blogs, forums, User Generated Content , surveys
• Public – Websites, News
• Phase 1:
• Sentiment detection, clustering
• Eduction – people, places, credit card #s
• Link expansion, Gender detection
• Curation, tagging, alerts
• Future Phase - Integrate HP Vertica for demographic profiling
• HP IDOL eduction sends interesting data to HP Vertica for statistical analysis and slice/dice
• Combine HP Vertica’s pattern-matching and graph-analysis at scale with HP IDOL’s ability to model concepts and enrich data
This is a rolling (up to 3 year) roadmap and is subject to change without notice
Prevent churn, analyze NPS surveys, react to product/warranty issues
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17
Real world: claims integrity
Leading health insurance company
Business need• Identify duplicate or inaccurate health insurance claims and
transactions (i.e. overpayment)
• Multiple legacy systems containing claims data, with little integration
Solution• Connect legacy systems and create a common index of claims data
regardless of location, type or source Identify unusual patterns in transactions to identify fraud or error
Business benefits• Massive ROI through reduction in
duplicate claims paid
• Improved operational efficiency
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18
Real world: expertise networks
Aircraft manufacturing
Business need• Employees waste 30 min/day finding info, duplicate work of others
• Identify expertise across global community of 35,000 engineers
• Avoid manual approaches such as describing areas of interest & expertise in contacts directory using predefined keywords
Solution• Generate user profiles automatically and in real time based on the pages
visited and documents read
• Alert employees when documents, other employees, match the work they are doing
Business benefits• Reduced time spent retrieving information by over 90%
• Identified teams working on similar projects across the globe
• ROI within 7 months© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP Confidential.
Summary
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.20
HP HavenBig Data platform
Gain insight from 100% of your data
• Connect to all of your machine, business, & human data sources
• Analyze at volume and velocity of data
• Develop modern data-driven applications
HP applications
Customer applications
Developer applications
Haven
Defined programming interfaces
Analytics, context and categorization
Data connectors
Social media
IT/OT ImagesAudioVideo Mobile Search engine
Email Texts DocumentsTransactional data
Records Compliancearchives
Scalable data stores
On-premise
In the Cloud
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.21
Check out the websites…. www.autonomy.com www.vertica.com www.hp.com
Top Related