Developing and validating a document classifier: a real-life story - Marko Smiljanic

Post on 06-Jan-2017

10 views 0 download

Transcript of Developing and validating a document classifier: a real-life story - Marko Smiljanic

Marko Smiljanić, NIRI Inteligent computing Ltd,CEO

Developing and validating a document classifier:a real-life story

Developing and validating a document classifier:

a real-life storyMarko Smiljanić, CEO

www.niri-ic.com

About us.

NIRI: 10 years in Intelligent Computing Text Mining Knowledge Discovery and Management All about Data Science

NIŠ

About me.

My role

COMPANY

Business Context The Challenge The Solution Effectiveness

Laboratory measurements Impact estimation Reality

Wrap up

The flow

Business context

Business context

Largest clients include Public Employment Services in EU, USA, and

Asia Staffing companies in EU, USA

Vacancies Job seekers

Job Taxonom

y

SkillTaxonom

y

ELISE Platform

Business Context The Challenge The Solution Effectiveness

Laboratory measurements Impact estimation Reality

Wrap up

The flow

Vacancies

Job Taxonom

y

Document Classification

Occupation Taxonomies ISCO (International Standard Classification of

Occupations) ESCO O*NET and many more ISCO level 1 (10)

ISCO level 2 (42)ISCO level 3 (124)ISCO level 4 (400)

ESCO level 5 (5000)

“Delivery service worker”

Challenges (for humans) Knowing the

taxonomy Ambiguous taxonomyHybrid positionsVague vacancy

Client’s situationin 2014

VacancyAggregato

rand Classifier

Correct Code? PublishRepair

Code!NO

23%

ОК65%

no help

14%

OK9%

no code

12%

2000-4000 per day (into >2000 taxonomy classes) %?

Business Context The Challenge The Solution Effectiveness

Laboratory measurements Impact estimation Reality

Wrap up

The flow

The Solution:NIRI will build you a better classifier

VacancyAggregato

rand Classifier

NIRI Classifier Publish2000-4000 per day

Really?How accurate will it be?How will it fit our process?

Reduce manual effort Increase volume Improve final accuracy

Really. We will (try to):

But you need to give us training data > 1M vacancies

No class12%

Not verified14%

Verified74%%?

Long tail effect

Architecture of our solution

FeatureExtractor Negotiator

Classifier 1

Classifier 2

Classifier N

…Vacancy [Class,

Confidence]+

Vacancy Classifier

External Services

What to do with confidence?

Vacancy, Code, ConfidenceVacancy, Code, ConfidenceVacancy, Code, ConfidenceVacancy, Code, ConfidenceVacancy, Code, ConfidenceVacancy, Code, ConfidenceVacancy, Code, ConfidenceVacancy, Code, ConfidenceVacancy, Code, ConfidenceVacancy, Code, ConfidenceVacancy, Code, Confidence…

Bulk Accept

To check manualy

Batch Processing

CO

NFID

EN

CE

High accuracy

Low accuracy

Using confidence

Business Context The Challenge The Solution Effectiveness

Laboratory measurements Impact estimation Reality

Wrap up

The flow

Measuring accuracy in the laboratory

No class12%

Not verified14%

Verified74%

No class

Incorrect

Correct

Test20%

Train80% Train

Test

x 5

Vacancy Classifier

Corpus Classifier Classifier 100 Classifier 1000

74% 78% 80% 85%

14%13% 12%

10%12% 9% 8% 5%

One of many Laboratory MeasurementsCorrect Incorrect No class

Measuring accuracy in the laboratory

Does this make any sense?

Yes, but…

Measuring accuracy in the laboratory

No class12%

Not verified14%

Verified74%

Vacancy Classifier

No class 9%

Incorrect13%

Correct78%

OriginalClassifier

This is not relaityBiased train/test setAccuracy of test set unknown Inability to test against 26%

Business Context The Challenge The Solution Effectiveness

Laboratory measurements Impact estimation Reality

Wrap up

The flow

Remember the process?

VacancyAggregato

rand Classifier

Correct Code? PublishRepair

Code!NO

23%

ОК65%

no help

14%

OK9%

no code

12%

This is what it actually looks like.Check Repair

Reduce manual effort Increase volume Improve final accuracy

We will

And we proposed this one.Bulk Accept Check Repair

Best/worst case analysis, some manual validation, careful assumptions:

Bulk Accept

Check Repair

Impact estimation showed that: Step 1 effort reduction 60%

(due to bulk acceptance) Step 2 effort reduction 11%

(due to bulk acceptance and top 5 offers) Significant published volume increase

(almost to 100%) Accuracy slightly larger

(+1%, to around 92%)

Does this make any sense?

Yes, but…

Business Context The Challenge The Solution Effectiveness

Laboratory measurements Impact estimation Reality

Wrap up

The flow

No class12%

Not verified14%

Verified74%%?

How can we measure production accuracy?

We can not,unless…

Golden Test Set

How was it built?Check & Repair4 eye principle

Vacancy Classifier

Published

Original Code&

Top 5 VC codes

Original Code&

Top 5 VC codes

Original Code&

Top 5 VC codes

Every single classification was marked as either Correct, Acceptable, or Wrong

Results

Current NIRI VC Current(HQ source)

NIRI VC (HQ source)

63.05%73.91% 72.06% 74.38%

65.98%77.56% 76.25% 78.69%

Golden Test Set ResultsCorrect Acceptable

Highest Quality Source (Training)

Business Context The Challenge The Solution Effectiveness

Laboratory measurements Impact estimation Reality

Wrap up

The flow

Wrap up Clean semantic data, in real-life, can only be a myth. We are looking into

data cleansing approaches. Measuring usefulness can be hard and expensive, but … … it can/must to be monitored after the system is deployed.

It changes over time. Continuous learning, where possible is a great thing. 1) Implementing state-of-the-art machine learning algorithm is one thing.

2) Making it useful is another. 3) Explaining that to the end-user is the third.

NIRI is a very cool company to work with!

I hope you liked the story, and I thank you for your attention.

Developing and validating a document classifier:

a real-life storyMarko Smiljanić, CEO

www.niri-ic.com