Transforming Social Data into Business Insight - IBM · 2016-04-05 · Transforming Social Data...

69
Transforming Social Data into Business Insights Marie Wallace, Vincent Burckhardt

Transcript of Transforming Social Data into Business Insight - IBM · 2016-04-05 · Transforming Social Data...

Transforming Social Data into Business InsightsMarie Wallace, Vincent Burckhardt

Copyright © 2015 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM.

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.

Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.

Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.

Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.

References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.

Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.

It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law

Notices and Disclaimers

2

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.

•IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.

Notices and Disclaimers cont.

3

IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.

Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.

The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.

The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

Please Note:

4

You are custodian of the most valuable data within the enterprise IF you can release it for business value

Are you an Analytics Rockstar?

5

6

Organizations with a highly engaged workforce significantly outperform those without

The shift to digital now makes analysis of engagement networks possible

Organizations with a highly engaged workforce significantly outperform those without

The shift to digital now makes analysis of engagement networks possible

7

Can we use analytics to better understand employee engagement and it’s impact on the business?

Capture & Understand your Enterprise Network

8

Management Employee

Capture & Understand your Enterprise Network

9

Management Employee

ODPi (Open Data Platform Initiative, odpi.org)

10

ODPi is an industry effort to promote and advance the state of Apache Hadoop and Big Data technology for the enterprise. It

currently has 24 member companies.

IBM is a founding member of ODPi and is

one of 4 members to release a data platform based on the ODP core;

IBM Open Platform.

PrioritiesCertifications for ODPi

compatible distributions

Guidelines for ODPi ISVs and consumers

Introduce more big data projects into ODPi

Data Exchange

Data Scientist & Developer Platform Services

Analytic Services

Data Processing & Management

IBM Open Platform (ibm.biz/ibmopenplatform)

11

IBM Engagement Analytics (ibm.com/engage)

12

Data Exchange

Data Scientist & Developer Platform Services

Analytic Services

Data Processing & Management

Capture & Understand your Enterprise Network

13

Management Employee

Helps each employee better understand their engagement, reputation, and helps them more effectively activate their network for maximum value

The Personal Social Dashboard

14

Activity: Measure of your activity

Reaction: Measure of how people respond to your activity

Eminence: Measure of how people respond to you

Network: Measure of the quality of your network and your role within it

Helps management better understand overall engagement and organizational health, identify issues and action accordingly

– Shows connectivity within & between teams

– Identifies people who play key roles

– Highlights organizational brittleness

The Organizational Dashboard

15

Organizational Health

16

Many analysis actionable w/ recommendations

17

Understand your engagement & reputation within the

social network

Act on your personal

recommendations to drive improvement

Employee Matching: Based on a person’s social activity define if, and to what level, they fit a specific social engagement trait

Template Instantiation: Generate recommendations that if followed can change and strengthen their engagement patterns

Based on Recommendation Templates & Network Analysis:

Innovation & Advocacy

18

#1 Collaboration Does Impact Business Outcome • Engaged employees are 120% more likely to generate

Innovation and 150% more likely to demonstrate Customer Advocacy

#2 Optimal Behavior is Different for Everyone• A variety of interactions most effectively

contribute to business outcome

#3 Discovering & Disseminating Optimal Behaviors is Key to Improving Business Outcome• The Personal Social Dashboard provides such a channel

Employee Retention

19

Does engagement change prior to an attrition event? Analyzed organizational, social, and

retention data Inspected 10,000 random employees as a

control group and 1188 employees who quit

Yes! And engagement analytics can help to predict attrition events Social Behavior Patterns: less engaged with differences in types of activity Volume of Activity: less activity several months prior to attrition event Network: Attrition is viral (common manager, passive and active network

Capture & Understand your Enterprise Network

20

Management Employee

Transforming discrete data into insights

http://techproductmanagement.com/wp-content/uploads/2014/03/BigData.jpg

Big Data Analytics

22

BusinessInsights

Analyticsdata

data

data

datadata

data

datadata data

data

data

data

data

data

data datadata

datadata

data

data

data

data

data

data

data

data

data

data

data

datadata data

data

data

data

datadata

data

data

data

data data

dataAnalytics

Our scope: making sense of the data

23

Date/time Latitude Longitude01/10/2015 15:30 53.3330556 -6.2488889

Single location

24

Date/time Latitude Longitude... ... ...

01/10/2015 16:15 53.3330556 -6.2488889

01/10/2015 16:30 53.4 -6.4666667

01/10/2015 16:45 53.4 -6.4666667

01/10/2015 17:00 53.4 -6.4666667

01/10/2015 17:15 53.4 -6.4666667

01/10/2015 15:45 53.4 -6.4666667

01/10/2015 15:45 53.3330556 -6.4666667

... ... ...

Where the person lives− House, apartment, ...− Type of neighbourhood

Where the person works− Potential income indications− Type of work

Where the person shops− Type of supermarket− Practice sport (cycling, running

)...

Locations for one person over one year

25

Date/time Latitude Longitude Person... ... ... ...

01/10/2015 16:15 53.3330556

-6.2488889 Vincent

01/10/2015 16:15 48.623881 7.747846 Bob

01/10/2015 16:15 28.497371 -81.407531 Sally

01/10/2015 16:15 53.4 -6.4666667 James

01/10/2015 16:30 53.4 -6.4666667 Vincent

01/10/2015 16:30 48.623881 7.747846 Bob

01/10/2015 16:30 28.497371 -81.407531 Sally

... ... ...

Social connections (2 or more people at the same location on a regular basis)

− Build general patterns to predict preferences and behaviors

− People who live in X and shop in Y tend to like Z

Locations for multiple people over one year

26

IBM Connections

Social events

Business Insights

Analytics

Collaboration tool ... that lets you “look under the hood”− Connections generates discrete events about who did what in the system at very granular level− By applying analytics to large number of events allows to define patterns, statistics .... business

insights

Value of IBM Connections

27

Extracting meaningful data from your social platform

Home pageSee what's happening across your social network

CommunitiesWork with people who share common roles and expertise

FilesPost, share, and discover documents,presentations, images, and more

Micro-bloggingReach out for help your social network

ProfilesFind the people you need

WikisCreate web content together

ActivitiesOrganize your work and tap your professional network

BookmarksSave, share, and discover bookmarks

BlogsPresent your own ideas, and learn from others

ForumsExchange ideas with, and benefit from the expertise of others

IBM Connections

29

IBM Connections provides APIs and SPIs that allow the value of the social data to be maximized by external systems:

− ALL Connections data can be accessed by external systems

− Open, transparent, breaking down silos

Pull data from IBM Connections− Programmatically access much of the same

information that you can through the IBM Connections user interface

Have Connections push data to you− All data changes (CUD) event in all IBM

Connections components can be supplied to external consumers

Connections Maximizes The Value of Social Data

30

Directory

JMX / WSAdminAdministration

Search

Person Card

User Directory

IBM Connections Apps

RDB

Common Services

NavigationalHeader File

System

Connections Architecture

31

HTML

Directory

JMX / WSAdminAdministration

Search

Person Card

User Directory

HTTP Server & Proxy Cache

POST

JavaScript Atom FeedAtom EntryPUT DELETE GET

HTML Form

IBM Connections Apps

RDB

Common ServicesREST API

Feed Reader Sametime Portlets Your AppLotus NotesBrowser Mashups

JSON

Microsoft Office

NavigationalHeader

Connections Atom API

FileSystem

Connections Architecture

32

HTML

Directory

JMX / WSAdminAdministration

Search

Person Card

User Directory

HTTP Server & Proxy Cache

POST

JavaScript Atom FeedAtom EntryPUT DELETE GET

HTML Form

IBM Connections Apps

RDB

Common Services

Other Enterprise Services

REST API

Feed Reader Sametime Portlets Your AppLotus NotesBrowser Mashups

JSON

Microsoft Office

NavigationalHeader

Connections Atom API

Integration bus Event SPI

Your App

FileSystem

Connections Architecture

33

Designed to allow 3rd party to get notified whenever a data change happens in any of the IBM Connections service

− Real-time events generated by IBM Connections include all create, update, and delete (CUD) operations

− Potential to represent the complete interaction footprint of the enterprise

− Allowing to capture, persist, model, analyze, visualize and monetize your enterprise network

SPI (System Programming Interface) vs API (Application Programming Interface)

− SPI at lower level than APIs ... contribute Java code at system level

− By contributing Java code written to this SPI, 3rd parties can listen to creation, deletion and update (and more!) events of content within IBM Connections

The Event SPI is the social data fire-hose

34

Events: collections of data generated when activities (data-modifying, notifications) occur in IBM Connections

− In the SPI, an event is represented by a Java bean / object

− A Event encapsulate data such as the type of action and the object (and container) involved in the action

Events are delivered to Event Handlers:− An event handler is a Java class implemented by a 3rd

party (you!)− Event handlers are registered in an XML file (event-

config.xml)− Instructing what type of event to send to a given handler− Connections delivers Java bean representing the event

to registered event handler(s)

Event SPI

Handler 1Handler 2

Handler N

Event-config.xml

Event SPI – Programming aspects

35

The Event SPI relies on event handlers written in Java to allow vendors to listen and process events generated by the system

− Running external code (untrusted) on Cloud is not possible

− Running 3rd party code on same WebSphere servers as our applications is not safe

− Multitenancy issues

Introducting Switchbox− Our plan is to allow customers/vendors to listen events

generated for their own organization on our Cloud applications without running code on our system

− Already leveraged by compliance solutions− Currently being implemented for broader consumption,

not available as of now

Cloud considerations

36

Reliable delivery mechanism− Delivery at least once, support and recover

from network failure− Latency tolerant

Ease of transition between on-premise and Cloud

− Java event handlers implemented for Event SPI can be run by Switchbox client

− Main difference being that the event handlers are deployed and run on customer infrastructure, outside IBM Connections datacenter

− SwitchBox client invokes event handlers upon reception of event

Base for generation of events from most IBM social apps (Sametime)

Event SPI

SwitchBox client

Handler 1

Handler 2

SwitchBox server

Switchboxhandler

Customer infrastructure

Switchbox is not currently available. This diagram shows our desire to provide such a solution to allow customer consume events from their own organization on Cloud

IBM Connections Cloud infrastructure

Cloud considerations

37

blog.entry.created:“Amy Jones posted a blog entry in the blog named XYZ”

The person who initiated this action.

Details: External id, name and, if not disabled, email address

Type Item ContainerActor

Type of action

Example: CREATE, UPDATE, DELETE, NOTIFY, MEMBERSHIP, ..

General concept for representing an individual entity within a container

Details: id, name, textual content, HTML and ATOM paths

General concept for representing a "bucket" or "container" that contains other items

Details: id, name

Event SPI – available data in each event

38

Many more data fields encapsulated in events:− Correlation item set to represent parent-child relationship (events about commenting action)− Target set, allowing to deduce interaction between content and people− Membership delta field, indicating who has been added/removed from a community, activity, ...− ... see Event SPI documentation for full list (JavaDoc)

Key point: the event model encapsulates

all of data needed to understand the interaction between people, content and

containers in the platform

Event SPI – available data in each event

39

Challenges of analytics:

Large amount of incoming event stream− Over 100+ events per second CUD− Growing on longer term− Scalable framework for analysis− Horizontal scale to address growth

(Near) real-time indexingNo data loss

Event SPI in the context of an analytic solution

40

Analysis, even basic, is time consuming, thus:

Analysis should not occur in the event handler, but in an external system (“Analytics Service”)The event handler should not wait until the analytic service processes the event

− It would result in an accumulation of events at Connections level

− Problematic as Connections queue retaining events to be delivered to event handler has a limited depth

=> Design event handler to consume and process events as fast as possible, ie: as the interface between IBM Connections and an external system

“Data backbone”Storage for asynchronous processing

Event SPI

Analytics Service

Event Handler

Goal: retaining as many events as possible for further analysis

Taming the fire-hose... (1/2)

41

Characteristics of the data backbone− Distributed and highly available− Horizontal scale− High throughput− Agnostic to consumers' state

Multiple options− Message broker

MQ / MQTT / ActiveMQ / Apache Kafka

− Database− ...

Taming the fire-hose... (2/2)

42

Send JSON representation of the event. Serialization to JSON through Open Source GSON library

Java class implementing the EventHandler interface

Integration with a message broker – Apache Kafka

43

Registration – through events-config.xml

Java class implementing EventHandler interface

Subscriptions define the events delivered by the SPI to the event handler.

Filtered by event name, source (IBM service), or/and type (CREATE, UPDATE, DELETE, ...)

Properties: name/value pair injected in the event handler java class.Typically used to pass config. settings

Integration with a message broker – Apache Kafka

44

Deployment – jar and dependencies made available to the SPI (running in the IBM Connections News application) through a Shared Library in WebSphere Application Server

Integration with a message broker – Apache Kafka

45

IBM Connections provides OpenSocial Activity Streams APIs allowing 3rd party to push their own events to the Activity Stream

Since Connections 4.5:− Events pushed through the Activity Stream APIs

are also surfaced in the Event SPI− An option allows to NOT surface an event in the

Activity Stream APIs, ie: only surface through the Event SPIs

=> 3rd party applications can also participate in the social analytics graph simply by publishing to the Connections Activity Stream APIs

3rd party events can also participate in the social analytics solution

46

Good news:Events surface in most case all data needed for analytics purposes (including the content the event is about)Events about the same object repeat data

− If there are X events about the same object, the item/correlation data set will always contain the most up-to-date information about the referenced object

For an analytic solution – in a nutshell, this means that the Event SPI should be sufficient in most cases

You can “pull” all data from Connections...

but is it really needed?

Pulling data – when is it needed ?

47

“Push” approach (Event SPI) is sufficient to build most analytic solution− All necessary content (textual content, tags, …) is surfaced in every single event− All operation changing relationships (ie: adding/removing member, colleague, follower) are

surfaced as events

“Pull” (REST APIs) approaches should stay limited to:1.“Bootstrap” the Analytics Service based on a Connections system with data existing prior to the

introduction of the event handler used in your analytic solution

Essentially building membership/network data (as needed)Seeding the content should not be needed, as it is repeated whenever an event about the content is generated

1.Fetching data not available through the Event SPI

Relatively “rare” for events generated from Connections

Pulling data – when is it needed ?

48

2 main approaches for pulling data from Connections

1. REST APIs (Atom / OpenSocial format)− REST-style HTTP based APIs (XML, Json format)− Transparency: programmatically access much of the same information that can be accessed

through the IBM Connections UI− “Drink your own champagne” - public APIs used internally by plug-ins, mobile … and even some

components Web UI (Activity Stream, Activities, …)

2. Seedlist− Designed to allow crawling of Connections data for indexing purpose by a search engine− Surfacing all content in the system – therefore it can be of some value for an analytic solution− HTTP based APIs (Atom XML format)

Pulling data from Connections

49

Example: /forums/seedlist/myserver returns ALL forum entries in the system− Textual content, author, number of comments, number of recommendations, parent id, ACL

Seedlist

50

REST APIs support basic authentication, form-based authentication and (for most APIs) OauthPrivate data: strict enforcement of access on API calls

− Not very convenient for access by an analytic system...

“Super user”− Concept of “super user” - access control checks

on private data are by-passed− On-premise: the “super user” is a user mapped in

the JEE “admin” role across all Connections services

−On Cloud: impersonation support can help to fetch data for a range of users (progressively being disclosed)

Authentication aspects for the REST APIs

51

In some very specific cases, data not available in a form easily consumable to build an analytic solution

− Example: getting the list of followers for a given object in the system− Query directly the Connections databases (in these specific cases only)− Database schema can change overtime and is private

REST APIs (Atom / OS APIs) Seedlist

Pros •Fine granularity: access content / meta-data for a specific object / container•Access relationship informationAPIs are available for fetching membership lists, network information, who liked a given object, ...

•Batch retrieval of textual content•Incremental updates (but the Event SPI is much more suitable for this purpose)

Cons Lack of batch retrieval capabilities

Focused around content - does not expose all the data (missing tags membership information)

Pulling data from Connections – What to use, when?

52

Leverage the Event SPI as much as possible− Provides (most of) the data needed for any elaborated

analytics solution− Just let Connections push data to you! Easier, perform

well“Fill the gaps” by pulling data from the Atom/Seedlist APIs

− Initial loading of relationship / content data− Data not available through the Event SPI

One final warning:− Analytic solution access to private data through the Event SPI, and Atom/Seedlist APIs (with admin

role)

=> Ensure your solution is not leaking private data to unauthorized users

Key Points

53

Analytics and Connections data

54

55

Credit: Paco Nathan

Key parts of typical analytic pipeline

56

Key parts of typical analytic pipeline

57

IBM Connections!

Key parts of typical analytic pipeline

58

IBM Connections!

* Extract: Consume events

* Transform: Transform format

* Load: Load transformed data to database / disk

* Clean data (fetch specific data fields from events, assign unique id to objects)* Represent social relationship as graph

A property graph has:− vertices and edges can have any number of properties− directed relationships

Graph structure is ideal to represent relationships between entities (people, objects)− Context around the event− Cause and effect of an event− Artefacts related to an event

Person A Person BStatus Update Status UpdateComment

creates createscomments on

Representing Connections data as graph

59

Key parts of typical analytic pipeline

60

IBM Connections!

* Extract: Consume events

* Transform: Transform format

* Load: Load transformed data to database / disk

* Clean data (fetch specific data fields from events, assign unique id to objects)* Represent social relationship as graph

Query graph to generate insights: activity, eminence, reaction, network.Store score per user and org

Key parts of typical analytic pipeline

61

IBM Connections!

* Extract: Consume events

* Transform: Transform format

* Load: Load transformed data to database / disk

* Clean data (fetch specific data fields from events, assign unique id to objects)* Represent social relationship as graph

Query graph to generate insights: activity, eminence, reaction, network.Store score per user and org

API / UI to surface scores generated in previous step

Volume

Velocity Variety

Veracity

100s of eventsper seconds

~500 bytes perevent+ bulk data

=> 180 GB per hour,4.3 TB per day

Not an issue withConnections, cantrust veracityof eventsfrom Connections

Semi-structured dataTime and spatial aspectsEasy to represent asgraph

4 dimensions of Big Data

62

63

IBM Open Platform (ibm.biz/ibmopenplatform)

64

Data Exchange

Data Scientist & Developer Platform Services

Analytic Services

Data Processing & Management

IBM Engagement Analytics (ibm.com/engage)

65

Data Exchange

Data Scientist & Developer Platform Services

Analytic Services

Data Processing & Management

Value of collaboration data:− From discrete events to generating deep insights about people, network … the whole organization− Key insights by leveraging Big Data Analytics on events− Insights only limited by data and your own ability to process it

IBM Connections has its own powerful set of APIs to access to most interactions in the system− Fully available on promise− Being unlocked on Cloud

Analytic platform available (IBM Open Platform)− Get started with IBM Open Platform and build on top of it

Key points

66

IBM Open Platform @ ibm.biz/ibmopenplatform

IBM Engagement Analytics @ ibm.com/engage

Event SPI @ ibm.biz/eventspi w/ Java Doc @ ibm.biz/eventspijavadoc

SocialBiz User Group @ www.socialbizug.org

Follow us on Twitter @IBMConnect, @IBMSocialBiz, @marie_wallace

LinkedIn @ ibm.biz/socbizlinkedin; participate in the our Social Business group

Facebook @ www.facebook.com/IBMSocialBiz; give us a Like

Social Business Insights Blog @ ibm.com/blogs/socialbusiness; join the conversation!

More resources online

67

Thank you

68

Based upon your session attendance, a customized list of surveys will be built for you. Please complete your surveys via the conference kiosks or any web enabled device at https://www.connectsurveys.com or through IBM Event Connect.

Your Feedback Is Important!

69