Wrangling Messy Data - A True Story

54
Public Jason Cao/SAP Digital Experience Marketing – Las Vegas Elizabeth Imm/SAP Market Introduction Services – Berlin EA104 – Wrangling Messy Data A True Story

Transcript of Wrangling Messy Data - A True Story

Page 1: Wrangling Messy Data - A True Story

Public

Jason Cao/SAP Digital Experience Marketing – Las VegasElizabeth Imm/SAP Market Introduction Services – Berlin

EA104 – Wrangling Messy DataA True Story

Page 2: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 2Public

Disclaimer

This presentation outlines our general product direction and should not be relied on in making apurchase decision. This presentation is not subject to your license agreement or any other agreementwith SAP. SAP has no obligation to pursue any course of business outlined in this presentation or todevelop or release any functionality mentioned in this presentation. This presentation and SAP'sstrategy and possible future developments are subject to change and may be changed by SAP at anytime for any reason without notice. This document is provided without a warranty of any kind, eitherexpress or implied, including but not limited to, the implied warranties of merchantability, fitness for aparticular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in thisdocument, except if such damages were caused by SAP intentionally or grossly negligent.

Page 3: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 3Public

Agenda

The Setting – The SAP Community Network (SCN)

The Players – Understanding Roles in Data Analysis

The Plot – Introducing Gamification on SCNThe goals for gamifying SCN

The Conflict – Messy data & other data problems

The Resolution – Approaching the problem3 SAP Lumira Use-Cases with SCN Gamification Data

The End – “And they lived happily ever after”…or do they?What’s next for SCN’s Gamification team?

Page 4: Wrangling Messy Data - A True Story

The SAP Community Network

Page 5: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 5Public

About SAP Community Network (SCN)A High-Tech, Professional Community

A 11-year old, mature community: Open to allNeed to encourage quality contributions

A place to grow reputation: Contributions showcase expertiseSerious game

Page 6: Wrangling Messy Data - A True Story

Gamification on SCN

Page 7: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 7Public

What is Gamification: Definition

Gamification is the use of game-thinking and game mechanics in anon-game context in order toengage users and solve problems.Gamification is used inapplications and processes toimprove user engagement, ROI,data quality, timeliness, andlearning.

Source: Wikipedia

Page 8: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 8Public

Why Introduce Advanced Game Mechanics?

BoostParticipation

BuildReputation

Inject Fun

DriveBehaviors

AddressChallenges

with OldSystem

Page 9: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 9Public

The SCN ExperienceGoal: Boost Participation

Registration:

Missions and badges forfirst-time login and

onboarding

Contributions:

Missions and badgesfor first contributions

and more

Page 10: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 10Public

The SCN ExperienceGoal: Drive Behaviors

Ethics:

Requirements inmissions to read the

rules of engagement andhow to search

Feedback:

Pay It Forwardmission to reward

positive engagement

Page 11: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 11Public

The SCN ExperienceGoal: Build Reputation

Quality:

Missions that require notonly content creation butreceiving good feedback

on the content

Quality:

Prerequisite of a certainlevel before you canearn some missions

Page 12: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 12Public

The SCN ExperienceGoal: Inject Fun

Hidden missions,unexpectedrecognition

SCN 10-yearanniversary

Data Geek Challenge

Page 13: Wrangling Messy Data - A True Story

Messy Data & Other DataProblems

Page 14: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 14Public

Mission Completion (Badges Awarded)

Page 15: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 15Public

Member Activity Comparison (Total Action Count)

Page 16: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 16Public

Increase in Active Users (Those who logged an action)

0

5000

10000

15000

20000

25000

30000

35000

40000

APR-28-2013 MAY-5-2013 MAY-12-2013 MAY-19-2013 MAY-26-2013

Active Users(Logged an Action)

Active Users

WK 1 19,445

WK 2 21,728

WK 3 22,623

WK 4 36,084

143%

Page 17: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 17Public

The Plot Thickens…

Duplicate data

Senseless data

Disparate data

Corrupted data

Page 18: Wrangling Messy Data - A True Story

Roles in Data Analysis

Page 19: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 19Public

People of Many Hats

DECISION MAKERS ANALYSTS DESIGNERSLeverage Knowledge Activate Data Build Data

Assets

Page 20: Wrangling Messy Data - A True Story

Approaching the Problem

Page 21: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 21Public

Whiteboarding for our Proof of Concept

Page 22: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 22Public

Result of Initial Data Cleansing

Total100MRows

3M Clean Rows

Sensible data…

Page 23: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 23Public

DataAcquisition Prepare Room (and Object Picker)

Visualize/ComposeRoom

PredictRoom

ShareRoom

Seven Common Data Analysis Tasks

Data analyst workflows often consists of seven high-level activity groups:

• Iterative activities, not linear

• Not all equal weight/time

Find

Wrangle

Profile

Enhance

Visualize

Predict

Share

Page 24: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 24Public

Seven High-level Data Analysis Tasks

Find – Examples:

• Data Connectivity / Opening Files

• Knowing who to ask

Data Analysis

Page 25: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 25Public

Seven High-level Data Analysis Tasks

Wrangle – Examples:

• Merge datasets

• Clean data

Data Analysis

Page 26: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 26Public

Seven High-level Data Analysis Tasks

Profile:

• Comparing data (e.g. spelling)

• Leading spaces or zero's

• Structured and unstructured data

Data Analysis

Page 27: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 27Public

Enhance:

• Add/Remove columns

• Hierarchies and calculations

• Geography and time

Seven High-level Data Analysis Tasks

Data Analysis

Page 28: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 28Public

Seven High-level Data Analysis Tasks

Visualize:

• Creating charts and visualizations

• (Note: a table is considered a visualization as well)

Data Analysis

Page 29: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 29Public

Seven High-level Data Analysis Tasks

Data Analysis

Page 30: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 30Public

Seven High-level Data Analysis Tasks

Share:• Cloud• Server• Email (visualization and dataset)

Data Analysis

Page 31: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 31Public

Sharing with Lumira Cloud and Lumira Server

Desktop

IT

SAP LumiraServer

MobileWeb

Page 32: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 32Public

3 Data Workflows for SAP Lumira Server

Lumira Datasets(real-time or static)

Lumira Stories

SAPLumira

SAP HANAStudio

Page 33: Wrangling Messy Data - A True Story

Use Case 1

How are missions influencing community adoption?

Page 34: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 34Public

Playing Around With Visualizations

Too many segments to be useful! Interesting visualization, but irrelevant!

Page 35: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 35Public

Data Preparation Basics

Replace Filter

Rank

Page 36: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 36Public

Insight to Help Us Focus on What Matters

Email remindersadd limited valueto onboarding.

Stop mass emails,& focus on closingonboarding gap.

Better use ofresources to focuson activatingmembers.

Mass emailcampaign

Page 37: Wrangling Messy Data - A True Story

Use Case 2

What’s the best day to launch missions on SCN?

Page 38: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 38Public

Data Preparation Basics

Group Replace

Page 39: Wrangling Messy Data - A True Story

39© 2014 SAP SE or an SAP affiliate company. All rights reserved.

Page 40: Wrangling Messy Data - A True Story

40© 2014 SAP SE or an SAP affiliate company. All rights reserved.

Page 41: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 41Public

Increasing Mission Design Maturity

Design + planningcan influencemission success.

Revise launchpractice to includetiming andpromotion.

More maturelaunch plan, withgreater communityawareness.

Page 42: Wrangling Messy Data - A True Story

Use Case 3

Can we reduce bad behavior?

~ quoted from SCN discussion thread

Page 43: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 43Public

Observations | 9 Months After Launch

Current Challenges Opportunities

Increased point cheating and plagiarism(perceived as disproportionate to activityincrease).

Dissatisfaction of loyal, well-establishedmembers and newer members alike.

High operational effort needed to address theseissues.

Strengthen quality requirements in badgedmission.

Further de-emphasize quantity and points infavor of increased quality and meaningfulengagement.

Think of a transforming approach in 2 steps.

Page 44: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 44Public

Data Preparation Basics

Group Hierarchy and Filter

Page 45: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 45Public

Turning Data Into Insight

Page 46: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 46Public

Feedback from Moderators

“Wow! These features really did wonders.Most of the people we reported from SDspace are gone, that itself indicates theimpact.” (Jyoti Prakash)

“I especially think the removal of pointsfrom Likes has eliminated about 2/3 of thepoint games.” (Michael Appleby)78% of Moderators who responded

to our poll report that changeshad a positive impact

Page 47: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 47Public

Creating Awareness for Community Challenges

We can influencespecific memberbehaviors.

Create missionsthat harnessgaming energy todesired behaviors.

Greater awarenessof communitytopics. Goodbehaviors.

Page 48: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 48Public

Conclusion

Focus on your goal- cut out the rest.

The more weknow, the more

questions we have.

Start with the endin mind.

Page 49: Wrangling Messy Data - A True Story

Epilogue

Page 50: Wrangling Messy Data - A True Story

50© 2014 SAP SE or an SAP affiliate company. All rights reserved.

Get intotheGame!

while (true)

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 50Public

Page 51: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 51Public

SAP d-code Virtual Hands-on Workshops and SAP d-code OnlineContinue your SAP d-code education after the event!

SAP d-code OnlineAccess replays of keynotes, Demo Jam, SAP d-codelive interviews, select lecture sessions, and more!Hands-on replays

http://sapdcode.com/online

SAP d-code Virtual Hands-on WorkshopsAccess hands-on workshops post-eventStarting January 2015Complementary with your SAP d-code registration

http://sapdcodehandson.sap.com

Page 52: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 52Public

Further Information

SAP Education and Certification Opportunitieswww.sap.com/education

Watch SAP d-code Onlinewww.sapcode.com/online

SAP Public Webscn.sap.com/community/lumirawww.saplumira.comwww.sap.com/LearnBI

Page 53: Wrangling Messy Data - A True Story

53© 2014 SAP SE or an SAP affiliate company. All rights reserved.

FeedbackPlease complete your session evaluation for

<session EA104>.

Jason Cao {[email protected]}Follow me on Twitter {@JayChaos}

Thanks for attending this SAP TechEd && d-code session.© 2014 SAP SE or an SAP affiliate company. All rights reserved. 53Public

Page 54: Wrangling Messy Data - A True Story

© 2014 SAP SE or an SAP affiliate company. All rights reserved. 54Public

© 2014 SAP SE or an SAP affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an

SAP affiliate company.

SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE(or an SAP affiliate company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademarkinformation and notices.

Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.

National product specifications may vary.

These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP SE or itsaffiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE orSAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothingherein should be construed as constituting an additional warranty.

In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop orrelease any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible futuredevelopments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time forany reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to placeundue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.