Active Learning to Classify Email 4/22/05. What’s the problem? How will I ever sort all these new...

Active Learningto Classify Email

4/22/05

What’s the problem?

How will I ever sort all these new emails?

What’s the problem? To get an idea of what mail I have gotten, I will need to

sort these new messages.

A great solution would be if I could sort just a few and my computer could sort the rest for me.

To make it really accurate, the assistant could even pick which messages I should manually sort, so that it can learn to do the best job possible. (Active Learning)

What’s the solution? To solve this problem, we need a way to

choose the most informative training examples.

This requires some way of sorting emails by how informative they are for classification.

Email Classification So, what do we know about email classification?

SVM and Naïve Bayes significantly outperform many other methods(Brutlag 2000, Kiritchenko 2001)

Both SVM and Naïve Bayes are suitable for “online” learning required for solving this problem effectively. (Cauwenberghs 2000)

Classifier accuracy varies more between users than between algorithms. (Kiritchenko 2001)

SVM performs better for users with more email in each folder.(Brutlag 2000)

Users with more email, such as in our example problem, tend to have more email in each folder than other users. (Klimt 2004)

Thus, we have chosen SVM as the basis for this research.

“Bag-of-Words” Model

email data “bag of words” SVMclassification

decision

Multiple SVMs Using separate SVMs for each section

email data

classificationdecision

Active Learning with SVM In general, examples closer to the decision boundary

hyperplane will cause larger displacement of that boundary. (Schohn and Cohn 2000, Tong 2001)

What if our prediction is right? Labeling the closer

example:

Labeling the farther example:

And if our prediction is wrong? Picking the closer

example:

Picking the farther example:

Incorporating Diversity In this example, the instance near the top is intuitively

more likely to be informative. This is known as “diversity” (Brinker 2003).

Active Learning with SVM But what about when you have multiple SVMs

(like one-vs-rest)? (Yan 2003)

The Enron Corpus

150+ users 200,000 emails

Initial Results Trained on 10%, Tested on 90%

Chrono-Diverse Algorithm The way a user sorts email changes over time. Pick training data that are maximally different from

previous data with respect to time.

Combination Algorithm Combine strengths of Standard and Chrono-Diverse. Take a weighted combination of their results. Adjust weighting with parameter lambda.

Results Trained on 10%, Tested on 90%

Parameter Tuning

Conclusions State-of-the-art algorithm for active learning with text

classification performs horribly on email data!

Choosing emails for time diversity works very well.

Combining the two works best.

Future Work

Improve the efficiency of SVM or find a better alternative

Determine when using chronological diversity performs best and worst

Adapt the algorithm to online classification

Active Learning to Classify Email 4/22/05. What’s the problem? How will I ever sort all these new...

Documents

Transcript of Active Learning to Classify Email 4/22/05. What’s the problem? How will I ever sort all these new...

WRITING EMAILS – EMAILS SCHREIBEN€¦ · WRITING EMAILS – EMAILS SCHREIBEN WRITING SUCCESSFUL AND EFFECTIVE EMAILS – ERFOLGREICHE UND EFFEKTIVE EMAILS SCHREIBEN Here is a checklist

Sorting Algorithms - vlsicad.ucsd.edu · • Selection (min) sort • Bubble sort • Insertion sort • Bucket sort • Merge sort • Bogo sort • Quick sort

5 S Your Spring Cleaning with Lean Tools · Japanese: Seiri Sort Clearing Classify Sort and identify necessary items to perform process (work) and eliminate all other items from work

BUBBLE SORT BUBBLE SORT INSERTION SORT INSERTION SORT SELECTION SORT SELECTION SORT RADIX SORT RADIX SORT.

€¦ · Web view · 2018-03-30Students count and recognize numbers 1 – 20. Sort and classify objects. ... Decoding & Word Recognition – Blending and Segmenting. ... Cursive

Simple Sort Algorithms Selection Sort Bubble Sort Insertion Sort.

PENGURUTAN - boldson.staff.gunadarma.ac.idboldson.staff.gunadarma.ac.id/Downloads/files/38434... · • Heap sort • Merge sort • Radix sort • Tree sort • Binary sort . Bubble

Insertion Sort Bubble Sort Selection Sort

Irrational - isd622.org€¦ · Topic: Numbers and Operationsp. LT: Classify, sort, and use rational and irrational numbers. EQ: What is the difference between rational and irrational

On Designing 2D Discrete Workspaces to Sort or Classify ... · On Designing 2D Discrete Workspaces to Sort or Classify Polyominoes Phillip Keldenich 1, Sheryl Manzoor 2, Li Huang

Sorting Algorithms n 2 Sorts ◦Selection Sort ◦Insertion Sort ◦Bubble Sort Better Sorts ◦Merge Sort ◦Quick Sort ◦Radix Sort.

Welcome to St Margaret’s · – growth mindset Innate ability Intelligence can grow Intelligence is fixed Effort leads to ... •compare, classify, sort •experiment, play with

Count Sort Classify Workbook

VDOE :: Virginia Department of Education Home - Sorting ... · Web view1.13The student will sort and classify concrete objects according to one or two attributes (size, shape, color,

Sorting - sjiang1.github.io · Sorting •Selection sort •Insertion sort •Bubble sort •Heap sort •Merge sort •Quick sort •Bucket sort 2

Tuloyang Edukasyonsa New Normal na Sitwasyon … · Akoay maaaring matutosa iba IkatlongLinggo 5. Sort and classify objects according to one attribute/property (shape, color, size,

Sort and Classify - Fairy Poppins · Thank you so much for choosing these community helpers sort and classify mats. You may also like these sort and classify mats ... You can show

Capture, organize and manage your contacts easier …download.dymo.com/dymo/Sell Sheets/CardScan_Series... · from emails and web pages 2 Edit, sort, search, classify contacts and

Sorting Algorithms Bubble Sort Merge Sort Quick Sort Randomized Quick Sort.

Sort and Classify Math and Science Games Gail Gerdemann 2009.