Data Design

Post on 13-Jan-2015

698 views 4 download

Tags:

description

Combining data mining building blocks to build real systems.

Transcript of Data Design

Data Design2114.409: Creative Research Practice

HTTP://WWW.FLICKR.COM/PHOTOS/SERGIU_BACIOIU/4370021957/

Reflection

HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/76719859/

Status Check

Concerns

Programming

What can we build

Course Outline

IntroductionSurvey Methods / Data MiningVisualization and AnalysisSocial Mechanics

1. Foundations

Creativity and Brainstorming

Project ManagementPrototyping

2. Methods

CrawlingText MiningTo be determined (TBD)Project Update

3. Prototyping

Project PresentationsReflection

TBD x34. Refinement

Last Week: Building Blocks

Clustering

Classification & Regression

Association Rules

Outlier Detection

HTTP://WWW.FLICKR.COM/PHOTOS/OGIMOGI/2253657555/

Data Mining Overview

Visualization, Storytelling

Design, Data Exploration

Analysis Techniques

Crawling, Surveys, UX Design

How do I see and communicate answers?

What questions should I ask of the data?

How do I clean and process the data?

How do I gather meaningful data?

HTTP://WWW.FLICKR.COM/PHOTOS/STRIATIC/2144933705/

Why might we prefer analysis?

LABORToo many pictures to look at.

Don’t know which are interesting.

ACCURACYCan test for statistical significance, etc.

Some patterns don’t visualize easily.

ClusteringFind natural groupings in the data

Organize data into classes:

‣ high intra-class similarity

‣ low inter-class similarity

ClusteringInput Data Output Clusters

Hard

Soft

Hierarchical

Similarities

[ # of clusters ]

Points

OR

OR

OR

Classification

Learn to map objects to categories

Regression

Learn map objects to continuous variables

ClassificationObservations

Labels

XY

f(x) = yLearn

Male

X = heightFemale

Y = gender

The Whole ProcessData Set

Featurized

Training Data Test Data

Featurization

Random Split (e.g. 90/10)

Training

Model

Results

Evaluation

Association RulesLearn interesting relations in the data

= proportion of events in which X occurs

Anomaly Detection

Detect strange events in the data

Simplest measure:

What CanWe Build?

HTTP://WWW.FLICKR.COM/PHOTOS/BPENDE/6736531173/

Collective Intelligence

How can we harness the activities of the world’s digital citizens to build new and useful consumer services?

Community)

Ar,cles,)Images,)Video)

Updates,)Reviews,)Comments)

Clicks,)Scrolls,)Time)

Likes,)Links,)

Checkins)

Collec,ve)Intelligence)

Politics

The Korean elections are coming. How does the Internet tell us more than traditional polling ever could?

Politics

What issues are important?

Who are the influencers?

How can we segment/characterize support groups?

How do we spread our opinions more widely?

Who will win the election?

TweetAuthorDateBody

RetweetsHashtags Prediction

CandidateLocation

ScoreConfidence

AuthorProfileTweets

FavoritesFollowingFollowersLocation

Clustering Classification &

Regression

Association Rules

Outlier Detection

Insert Magic Here?

Workshop

Tweet Inputs

Author Inputs

Sentiment + Candidate

Refinements

Scoring Correction based on past elections

RMSE Evaluation

System Overview

Tweet + Label

N-Gram Features

Training Process

Input Observation

Output Label

Confusion MatrixEvaluation

Feature Extractor

Classifier

Sentiment Detail

Entertainment

HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/

Food Movements

ShoppingCollaboration

HTTP://WWW.FLICKR.COM/PHOTOS/ZOOBOING/4473219605/

Travel

Investing Medicine Trust

HTTP://WWW.FLICKR.COM/PHOTOS/TRAVEL_AFICIONADO/2396819536/

HTTP://WWW.FLICKR.COM/PHOTOS/FIDELMAN/4640722483/

HTTP://WWW.FLICKR.COM/PHOTOS/AGECOMBAHIA/6425101047/ HTTP://WWW.FLICKR.COM/PHOTOS/MARKETINGFACTS/6758968163/

HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/ HTTP://WWW.FLICKR.COM/PHOTOS/WILLIA4/2504379334/ HTTP://WWW.FLICKR.COM/PHOTOS/GILSONROME/6247208325/

HTTP://WWW.FLICKR.COM/PHOTOS/FELIPENEVES/5414239936/

Homework: Data Mining1. Form groups!

2. Choose a Collective Intelligence topic from Lecture 1, or propose similar.

3. Make a list of data sources that might provide insights to that topic.

4. Propose a set of meaningful questions about the data based on your intuition.

5. How would you have to clean/process your data to start answering those questions?

6. Consider clustering, association rules, anomaly detection, classification. For each technique, how might you apply it to the data and what would it show?

7. Document your work and be prepared to present.

HTTP://WWW.FLICKR.COM/PHOTOS/31907740@N00/4860840019/

Feedback