Using Machine Learning to Monitor Collaborative Interactions

Using Machine Learning to Monitor Collaborative Interactions

Carolyn Penstein Rosé

Language Technologies Institute/ Human-Computer Interaction

Institute

VMT-Basilica (Kumar & Rosé, 2010)

Download tools at:http://www.cs.cmu.edu/~cprose/TagHelper.htmlhttp://www.cs.cmu.edu/~cprose/SIDE.html

Monitoring Collaboration with Machine Learning Technology

TagHelper

Labeled Texts

Unlabeled Texts

Labeled Texts

A Model that can Label More Texts

Time

Beh

avio

r

<Triggered Intervention>

TagHelper Tools and SIDE

TagHelper Tools uses text miningtechnology to automate annotationof conversational data SIDE facilitates rapid prototyping of reporting

interfaces for group learning facilitators

Define Summaries

Annotate Data

Visualize Annotated Data

http://www.cs.cmu.edu/~cprose/TagHelper.htmlhttp://www.cs.cmu.edu/~cprose/SIDE.html

Important caveat!! Machine learning isn’t magic But it can be useful for

identifying meaningful patterns in your data when used properly

Proper use requires insight into your data

?

Naïve Approach: When all you have is a hammer…

TargetRepresentationData

Naïve Approach: When all you have is a hammer…

TargetRepresentation

Problem: there isn’t one universally best approach!!!!!

Data

Slightly less naïve approach: Aimless wandering…




Problem 1: It takes too long!!!

Data



Problem 2: You might not realize all of the options that are available to you!

Data

Expert Approach: Hypothesis driven




You might end up with the same solution in the end, but you’ll get there faster.

Data



Today we’ll start to learn how!

Data

What is machine learning?

Automatically or semi-automatically Inducing concepts (i.e., rules) from dataFinding patterns in dataExplaining dataMaking predictions

Data Learning Algorithm Model

New Data

PredictionClassification Engine

How does machine learning work?

The simplest rule learner willlearn to predict whatever isthe most frequent result class.This is called the majorityClass.

What will the rule be in this case?

It will always predict yes.

A slightly more sophisticated rule learner will find the feature that gives the mostinformation about the result class. Whatdo you think that would be in this case?

Outlook:Sunny -> NoOvercast -> YesRainy-> Yes

<Feature Name>:<value> -> <prediction><value> -> <prediction>…

What will be the prediction?

Outlook:Sunny -> NoOvercast -> YesRainy-> Yes

Model

New Data

Yes

More Complex Algorithm… Two simple algorithms

last time0R – Predict the

majority class1R – Use the most

predictive single feature Today – Intro to

Decision TreesToday we will stay at a

high levelWe’ll investigate more

details of the algorithm next time

* Only makes 2 mistakes!

More Complex Algorithm… Two simple algorithms

last time0R – Predict the

majority class1R – Use the most

predictive single feature Today – Intro to

Decision TreesToday we will stay at a

high levelWe’ll investigate more

details of the algorithm next time

* Only makes 2 mistakes!

What will it do with this example?

Why is it better?

Not because it is more complexSometimes more complexity makes

performance worse What is different in what the three rule

representations assume about your data?0R1RTrees

The best algorithm for your data will give you exactly the power you need

Why is it better?





Let’s say you know the rule you are trying to learnis a circle and you have these points. What rulewould you learn?

Why is it better?






Now lets say you don’t know the shape, now what would you learn?

Why is it better?






Now lets say you don’t know the shape, now what would you learn?

If you know the shape, you have fewer degreesof freedom – less room to make a mistake.

Why is it better?





What do concepts look like?

Clarification: Concepts as Lines

R B

S

T

C

X

X

X

X

X

X

Machine Learning Process Overview Get to know your data

What distinguishes messages from different categories

Represent messages in terms of features Use feature table tab

Build machine learning model Use machine learning tab

Learn from mistakes, and try again Use feature analyzer tab

Features Coding

Machine Learning

Algorithms you will use

Decision Trees (J48): good with small feature sets, can find contingencies between features

Naïve Bayes: fast, makes decisions based on probabilities

Support Vector Machines (SMO), makes decisions based on weights, usually works well on text

Setting Up Your Data

How do you know when you have coded enough data?

What distinguishesQuestions and Statements?

Not all questionsend in a questionmark.

Not all WH wordsoccur in questionsI versus you isnot a reliable predictor

You need to codeenough to avoidlearning rules thatwon’t work

Basic Idea

Represent text as a vector where each position corresponds to a term

This is called the “bag of words” approach

Cows make cheese

110001

Hens lay eggs 001110

CheeseCowsEggsHensLayMake

But same representationBut same representationfor “Cheese makes cows.”!for “Cheese makes cows.”!

What can’t you conclude from “bag of words” representations?

Causality: “X caused Y” versus “Y caused X”

Roles and Mood: “Which person ate the food that I prepared this morning and drives the big car in front of my cat” versus “The person, which prepared food that my cat and I ate this morning, drives in front of the big car.” Who’s driving, who’s eating, and who’s preparing

food?

Part of Speech Tagging

1. CC Coordinating conjunction

2. CD Cardinal number 3. DT Determiner 4. EX Existential there 5. FW Foreign word 6. IN Preposition/subord 7. JJ Adjective 8. JJR Adjective,

comparative 9. JJS Adjective, superlative 10.LS List item marker 11.MD Modal

12.NN Noun, singular or mass

13.NNS Noun, plural 14.NNP Proper noun,

singular 15.NNPS Proper noun, plural 16.PDT Predeterminer 17.POS Possessive ending 18.PRP Personal pronoun 19.PP Possessive pronoun 20.RB Adverb 21.RBR Adverb, comparative 22.RBS Adverb, superlative

http://www.ldc.upenn.edu/Catalog/docs/treebank2/cl93.html

Part of Speech Tagging

23.RP Particle

24.SYM Symbol

25.TO to

26.UH Interjection

27.VB Verb, base form

28.VBD Verb, past tense

29.VBG Verb, gerund/present participle

30.VBN Verb, past participle

31.VBP Verb, non-3rd ps. sing. present

32.VBZ Verb, 3rd ps. sing. present

33.WDT wh-determiner

34.WP wh-pronoun

35.WP Possessive wh-pronoun

36.WRB wh-adverb

http://www.ldc.upenn.edu/Catalog/docs/treebank2/cl93.html

Feature Space Design

Feature Space DesignThink like a computer!Machine learning algorithms look for features that

are good predictors, not features that are necessarily meaningful

Look for approximations If you want to find questions, you don’t need to do a

complete syntactic analysis Look for question marks Look for wh-terms that occur immediately before an

auxilliary verb


Feature Space DesignPunctuation can be a “stand in” for mood

“you think the answer is 9?” “you think the answer is 9.”

Bigrams capture simple lexical patterns “common denominator” versus “common multiple”

POS bigrams capture syntactic or stylistic information

“the answer which is …” vs “which is the answer”Line length can be a proxy for explanation

depth


Feature Space DesignContains non-stop word can be a predictor of

whether a conversational contribution is contentful

“ok sure” versus “the common denominator”Remove stop words removes some distracting

featuresStemming allows some generalization

Multiple, multiply, multiplicationRemoving rare features is a cheap form of

feature selection Features that only occur once or twice in the corpus

won’t generalize, so they are a waste of time to include in the vector space

Error Analysis

Any Questions?

Using Machine Learning to Monitor Collaborative Interactions

Documents

Transcript of Using Machine Learning to Monitor Collaborative Interactions