Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé...

23
Moving Ahead: Moving Ahead: Creative Feature Creative Feature Extraction and Error Extraction and Error Analysis Techniques Analysis Techniques Carolyn Penstein Ros Carolyn Penstein Ros é é Carnegie Mellon University Carnegie Mellon University Funded through the Pittsburgh Science of Learning Center and The Office of Naval Research, Cognitive and Neural

Transcript of Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé...

Page 1: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Moving Ahead: Creative Moving Ahead: Creative Feature Extraction and Feature Extraction and

Error Analysis TechniquesError Analysis Techniques

Carolyn Penstein RosCarolyn Penstein RosééCarnegie Mellon UniversityCarnegie Mellon University

Funded through the Pittsburgh Science of Learning Center and The Office of Naval Research, Cognitive and Neural Sciences Division

Page 2: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

OutlineOutline

New Feature CreationNew Feature Creation Error AnalysisError Analysis

Page 3: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

New Feature CreationNew Feature Creation

Page 4: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Why create new features?Why create new features?

You may want to generalize across sets of You may want to generalize across sets of related wordsrelated words Color = {red,yellow,orange,green,blue}Color = {red,yellow,orange,green,blue} Food = {cake,pizza,hamburger,steak,bread}Food = {cake,pizza,hamburger,steak,bread}

You may want to detect contingenciesYou may want to detect contingencies The text must mention both cake and The text must mention both cake and

presents in order to count as a birthday partypresents in order to count as a birthday party You may want to combine theseYou may want to combine these

The text must include a color and a foodThe text must include a color and a food

Page 5: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Why create new features by hand?Why create new features by hand?

More likely to capture meaningful More likely to capture meaningful generalizationsgeneralizations

Build in knowledge so you can get by with Build in knowledge so you can get by with less training dataless training data

Page 6: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Rule LanguageRule Language

ANY() is used to create listsANY() is used to create lists COLOR = ANY(red,yellow,green,blue,purple)COLOR = ANY(red,yellow,green,blue,purple) FOOD = ANY(cake,pizza,hamburger,steak,bread)FOOD = ANY(cake,pizza,hamburger,steak,bread)

ALL() is used to capture contingenciesALL() is used to capture contingencies ALL(cake,presents)ALL(cake,presents)

More complex rulesMore complex rules ALL(COLOR,FOOD)ALL(COLOR,FOOD)

Page 7: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Group Project: Group Project: Make a rule that will match against Make a rule that will match against

questions but not statementsquestions but not statements

Question Tell me what your favorite color is.

Statement I tell you my favorite color is blue.

Question Where do you live?

Statement I live where my family lives.

Question Which kinds of baked goods do you prefer

Statement I prefer to eat wheat bread.

Question Which courses should I take?

StatementYou should take my applied machine learning course.

Question Tell me when you get up in the morning.

Statement I get up early.

Page 8: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Possible RulePossible Rule

ANY(ALL(tell,me),BOL_WDT,BOL_WRB)ANY(ALL(tell,me),BOL_WDT,BOL_WRB)

Page 9: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Advanced Feature EditingAdvanced Feature Editing

* Click here

Page 10: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Types of Basic FeaturesTypes of Basic Features Primitive features Primitive features

inclulde unigrams, inclulde unigrams, bigrams, and POS bigrams, and POS bigramsbigrams

Page 11: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Types of Basic FeaturesTypes of Basic Features The Options change The Options change

which primitive features which primitive features show up in the Unigram, show up in the Unigram, Bigram, and POS bigram Bigram, and POS bigram listslists You can choose to remove You can choose to remove

stopwords or notstopwords or not You can choose whether or You can choose whether or

not to strip endings off not to strip endings off words with stemmingwords with stemming

You can choose how You can choose how frequently a feature must frequently a feature must appear in your data in appear in your data in order for it to show up in order for it to show up in your listsyour lists

Page 12: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Types of Basic FeaturesTypes of Basic Features

* Now let’s look at how to createnew features.

Page 13: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Creating New FeaturesCreating New Features

*The feature editor allows you to createnew feature definitions

* Click on + to add your new feature

Page 14: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Examining a New FeatureExamining a New Feature

•Right click on a feature toexamine where it matches inyour data

Page 15: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Examining a New FeatureExamining a New Feature

Page 16: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Error AnalysisError Analysis

Page 17: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Create an Error Analysis FileCreate an Error Analysis File

Page 18: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Use TagHelper to Code Uncoded Use TagHelper to Code Uncoded FileFile

•The output file containsthe codes TagHelperassigned.

•What you want to do now is to remove prediction column and insert the correct answers next tothe TagHelper assignedanswers.

Page 19: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Load Error Analysis FileLoad Error Analysis File

Page 20: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Load Error Analysis FileLoad Error Analysis File

Page 21: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Error Analysis StrategiesError Analysis Strategies

Look for large error cells in the confusion Look for large error cells in the confusion matrixmatrix

Locate the examples that correspond to Locate the examples that correspond to that cellthat cell

What features do those examples share?What features do those examples share? How are they different from the examples How are they different from the examples

that were classified correctly?that were classified correctly?

Page 22: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Group ProjectGroup Project

Load in the NewsGroupTrain.xls data setLoad in the NewsGroupTrain.xls data set What is the best performance you can get by playing What is the best performance you can get by playing

with the standard TagHelper tools feature options?with the standard TagHelper tools feature options? Train a model using the best settings and then Train a model using the best settings and then

use it to assign codes to NewsGroupTest.xlsuse it to assign codes to NewsGroupTest.xls Copy in Answer column from Copy in Answer column from

NewsGroupAnswers.xlsNewsGroupAnswers.xls Now do an error analysis to determine why Now do an error analysis to determine why

frequent mistakes are being madefrequent mistakes are being made How could you do better?How could you do better?

Page 23: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.