Data Design

26
Data Design 2114.409: Creative Research Practice HTTP://WWW.FLICKR.COM/PHOTOS/SERGIU_BACIOIU/4370021957/

description

Combining data mining building blocks to build real systems.

Transcript of Data Design

Page 1: Data Design

Data Design2114.409: Creative Research Practice

HTTP://WWW.FLICKR.COM/PHOTOS/SERGIU_BACIOIU/4370021957/

Page 2: Data Design

Reflection

HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/76719859/

Status Check

Concerns

Programming

What can we build

Page 3: Data Design

Course Outline

IntroductionSurvey Methods / Data MiningVisualization and AnalysisSocial Mechanics

1. Foundations

Creativity and Brainstorming

Project ManagementPrototyping

2. Methods

CrawlingText MiningTo be determined (TBD)Project Update

3. Prototyping

Project PresentationsReflection

TBD x34. Refinement

Page 4: Data Design

Last Week: Building Blocks

Clustering

Classification & Regression

Association Rules

Outlier Detection

HTTP://WWW.FLICKR.COM/PHOTOS/OGIMOGI/2253657555/

Page 6: Data Design

Data Mining Overview

Visualization, Storytelling

Design, Data Exploration

Analysis Techniques

Crawling, Surveys, UX Design

How do I see and communicate answers?

What questions should I ask of the data?

How do I clean and process the data?

How do I gather meaningful data?

Page 7: Data Design

HTTP://WWW.FLICKR.COM/PHOTOS/STRIATIC/2144933705/

Why might we prefer analysis?

LABORToo many pictures to look at.

Don’t know which are interesting.

ACCURACYCan test for statistical significance, etc.

Some patterns don’t visualize easily.

Page 8: Data Design

ClusteringFind natural groupings in the data

Organize data into classes:

‣ high intra-class similarity

‣ low inter-class similarity

Page 9: Data Design

ClusteringInput Data Output Clusters

Hard

Soft

Hierarchical

Similarities

[ # of clusters ]

Points

OR

OR

OR

Page 10: Data Design

Classification

Learn to map objects to categories

Regression

Learn map objects to continuous variables

Page 11: Data Design

ClassificationObservations

Labels

XY

f(x) = yLearn

Male

X = heightFemale

Y = gender

Page 12: Data Design

The Whole ProcessData Set

Featurized

Training Data Test Data

Featurization

Random Split (e.g. 90/10)

Training

Model

Results

Evaluation

Page 13: Data Design

Association RulesLearn interesting relations in the data

= proportion of events in which X occurs

Page 14: Data Design

Anomaly Detection

Detect strange events in the data

Simplest measure:

Page 15: Data Design

What CanWe Build?

HTTP://WWW.FLICKR.COM/PHOTOS/BPENDE/6736531173/

Page 16: Data Design

Collective Intelligence

How can we harness the activities of the world’s digital citizens to build new and useful consumer services?

Community)

Ar,cles,)Images,)Video)

Updates,)Reviews,)Comments)

Clicks,)Scrolls,)Time)

Likes,)Links,)

Checkins)

Collec,ve)Intelligence)

Page 17: Data Design

Politics

The Korean elections are coming. How does the Internet tell us more than traditional polling ever could?

Page 18: Data Design

Politics

What issues are important?

Who are the influencers?

How can we segment/characterize support groups?

How do we spread our opinions more widely?

Who will win the election?

Page 20: Data Design

TweetAuthorDateBody

RetweetsHashtags Prediction

CandidateLocation

ScoreConfidence

AuthorProfileTweets

FavoritesFollowingFollowersLocation

Clustering Classification &

Regression

Association Rules

Outlier Detection

Insert Magic Here?

Page 21: Data Design

Workshop

Page 22: Data Design

Tweet Inputs

Author Inputs

Sentiment + Candidate

Refinements

Scoring Correction based on past elections

RMSE Evaluation

System Overview

Page 23: Data Design

Tweet + Label

N-Gram Features

Training Process

Input Observation

Output Label

Confusion MatrixEvaluation

Feature Extractor

Classifier

Sentiment Detail

Page 24: Data Design

Entertainment

HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/

Food Movements

ShoppingCollaboration

HTTP://WWW.FLICKR.COM/PHOTOS/ZOOBOING/4473219605/

Travel

Investing Medicine Trust

HTTP://WWW.FLICKR.COM/PHOTOS/TRAVEL_AFICIONADO/2396819536/

HTTP://WWW.FLICKR.COM/PHOTOS/FIDELMAN/4640722483/

HTTP://WWW.FLICKR.COM/PHOTOS/AGECOMBAHIA/6425101047/ HTTP://WWW.FLICKR.COM/PHOTOS/MARKETINGFACTS/6758968163/

HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/ HTTP://WWW.FLICKR.COM/PHOTOS/WILLIA4/2504379334/ HTTP://WWW.FLICKR.COM/PHOTOS/GILSONROME/6247208325/

HTTP://WWW.FLICKR.COM/PHOTOS/FELIPENEVES/5414239936/

Page 25: Data Design

Homework: Data Mining1. Form groups!

2. Choose a Collective Intelligence topic from Lecture 1, or propose similar.

3. Make a list of data sources that might provide insights to that topic.

4. Propose a set of meaningful questions about the data based on your intuition.

5. How would you have to clean/process your data to start answering those questions?

6. Consider clustering, association rules, anomaly detection, classification. For each technique, how might you apply it to the data and what would it show?

7. Document your work and be prepared to present.

HTTP://WWW.FLICKR.COM/PHOTOS/31907740@N00/4860840019/

Page 26: Data Design

Feedback