Data Design
-
Upload
michael-shilman -
Category
Technology
-
view
698 -
download
4
description
Transcript of Data Design
Data Design2114.409: Creative Research Practice
HTTP://WWW.FLICKR.COM/PHOTOS/SERGIU_BACIOIU/4370021957/
Reflection
HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/76719859/
Status Check
Concerns
Programming
What can we build
Course Outline
IntroductionSurvey Methods / Data MiningVisualization and AnalysisSocial Mechanics
1. Foundations
Creativity and Brainstorming
Project ManagementPrototyping
2. Methods
CrawlingText MiningTo be determined (TBD)Project Update
3. Prototyping
Project PresentationsReflection
TBD x34. Refinement
Last Week: Building Blocks
Clustering
Classification & Regression
Association Rules
Outlier Detection
HTTP://WWW.FLICKR.COM/PHOTOS/OGIMOGI/2253657555/
This Week: Systems
HTTPS://WWW.FACEBOOK.COM/PHOTO.PHP?FBID=407391545956901&SET=A.407391429290246.110679.100000581776191&TYPE=3&THEATER
Data Mining Overview
Visualization, Storytelling
Design, Data Exploration
Analysis Techniques
Crawling, Surveys, UX Design
How do I see and communicate answers?
What questions should I ask of the data?
How do I clean and process the data?
How do I gather meaningful data?
HTTP://WWW.FLICKR.COM/PHOTOS/STRIATIC/2144933705/
Why might we prefer analysis?
LABORToo many pictures to look at.
Don’t know which are interesting.
ACCURACYCan test for statistical significance, etc.
Some patterns don’t visualize easily.
ClusteringFind natural groupings in the data
Organize data into classes:
‣ high intra-class similarity
‣ low inter-class similarity
ClusteringInput Data Output Clusters
Hard
Soft
Hierarchical
Similarities
[ # of clusters ]
Points
OR
OR
OR
Classification
Learn to map objects to categories
Regression
Learn map objects to continuous variables
ClassificationObservations
Labels
XY
f(x) = yLearn
Male
X = heightFemale
Y = gender
The Whole ProcessData Set
Featurized
Training Data Test Data
Featurization
Random Split (e.g. 90/10)
Training
Model
Results
Evaluation
Association RulesLearn interesting relations in the data
= proportion of events in which X occurs
Anomaly Detection
Detect strange events in the data
Simplest measure:
What CanWe Build?
HTTP://WWW.FLICKR.COM/PHOTOS/BPENDE/6736531173/
Collective Intelligence
How can we harness the activities of the world’s digital citizens to build new and useful consumer services?
Community)
Ar,cles,)Images,)Video)
Updates,)Reviews,)Comments)
Clicks,)Scrolls,)Time)
Likes,)Links,)
Checkins)
Collec,ve)Intelligence)
Politics
The Korean elections are coming. How does the Internet tell us more than traditional polling ever could?
Politics
What issues are important?
Who are the influencers?
How can we segment/characterize support groups?
How do we spread our opinions more widely?
Who will win the election?
HTTP://WWW.USATODAY.COM/TECH/NEWS/STORY/2012-03-05/SOCIAL-
SUPER-TUESDAY-PREDICTION/53374536/1
“Can social media predict
election outcomes?”
How can we build this?
TweetAuthorDateBody
RetweetsHashtags Prediction
CandidateLocation
ScoreConfidence
AuthorProfileTweets
FavoritesFollowingFollowersLocation
Clustering Classification &
Regression
Association Rules
Outlier Detection
Insert Magic Here?
Workshop
Tweet Inputs
Author Inputs
Sentiment + Candidate
Refinements
Scoring Correction based on past elections
RMSE Evaluation
System Overview
Tweet + Label
N-Gram Features
Training Process
Input Observation
Output Label
Confusion MatrixEvaluation
Feature Extractor
Classifier
Sentiment Detail
Entertainment
HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/
Food Movements
ShoppingCollaboration
HTTP://WWW.FLICKR.COM/PHOTOS/ZOOBOING/4473219605/
Travel
Investing Medicine Trust
HTTP://WWW.FLICKR.COM/PHOTOS/TRAVEL_AFICIONADO/2396819536/
HTTP://WWW.FLICKR.COM/PHOTOS/FIDELMAN/4640722483/
HTTP://WWW.FLICKR.COM/PHOTOS/AGECOMBAHIA/6425101047/ HTTP://WWW.FLICKR.COM/PHOTOS/MARKETINGFACTS/6758968163/
HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/ HTTP://WWW.FLICKR.COM/PHOTOS/WILLIA4/2504379334/ HTTP://WWW.FLICKR.COM/PHOTOS/GILSONROME/6247208325/
HTTP://WWW.FLICKR.COM/PHOTOS/FELIPENEVES/5414239936/
Homework: Data Mining1. Form groups!
2. Choose a Collective Intelligence topic from Lecture 1, or propose similar.
3. Make a list of data sources that might provide insights to that topic.
4. Propose a set of meaningful questions about the data based on your intuition.
5. How would you have to clean/process your data to start answering those questions?
6. Consider clustering, association rules, anomaly detection, classification. For each technique, how might you apply it to the data and what would it show?
7. Document your work and be prepared to present.
HTTP://WWW.FLICKR.COM/PHOTOS/31907740@N00/4860840019/
Feedback