Using Deep Learning And NLP To Predict Performance From Resumes

61
Using Deep Learning To Predict Performance From Resumes Ben Taylor, Chief Data Scientist

Transcript of Using Deep Learning And NLP To Predict Performance From Resumes

Page 1: Using Deep Learning And NLP To Predict Performance From Resumes

Using Deep Learning To Predict Performance FromResumes

Ben Taylor, Chief Data Scientist

Page 2: Using Deep Learning And NLP To Predict Performance From Resumes

INTRODUCTIONS

Page 3: Using Deep Learning And NLP To Predict Performance From Resumes

Ben Taylor @bentaylordata

Background Personal

Page 4: Using Deep Learning And NLP To Predict Performance From Resumes

• Sequoia Capital

• Largest Video Interviewing Platform

• Forbes #10 most promising companies

• Global: 189 countries

Page 5: Using Deep Learning And NLP To Predict Performance From Resumes

NATURAL LANGUAGE PROCESSING (NLP)

Page 6: Using Deep Learning And NLP To Predict Performance From Resumes

GRIT MOTIVATION ENGAGEMENT PERFORMANCE

1 55 80 95%

0 75 10 22%

0 50 20 57%

1 20 90 91%

0 40 60 11%

BasicTutorialOnHowToBuildANumericFeatureModel

BUILDING A MODEL

Page 7: Using Deep Learning And NLP To Predict Performance From Resumes

ESSAY GRIT MOTIVATION ENGAGEMENT PERFORMANCE

I want to work here 1 55 80 95%

I have great teamwork 0 75 10 22%

Synergy 0 50 20 57%

I have so much grit 1 20 90 91%

They fired that individual 0 40 60 11%

Now what?!?

BUILDING A MODEL

Page 8: Using Deep Learning And NLP To Predict Performance From Resumes

ESSAY PERFORMANCE

I want to work here 95%

I have great teamwork 22%

Synergy 57%

I have so much grit 91%

They fired that individual 11%

There are really two different options, mapping or tokenizing

BUILDING A MODEL

Map:Bad=0Good=1Better=2Best=3

Tokenize:Female=1Male=1

Female Male

1 0

0 1

Page 9: Using Deep Learning And NLP To Predict Performance From Resumes

I want to work here have great PERF.1 1 1 1 1 0 0 95%1 0 0 0 0 1 1 22%0 0 0 0 0 0 0 57%1 0 0 0 0 1 0 91%0 0 0 0 0 0 0 11%

Tokenizethetextintouniquewordcolumns

BUILDING A MODEL

ESSAY PERFORMANCE

I want to work here 95%

I have great teamwork 22%

Synergy 57%

I have so much grit 91%

They fired that individual 11%

Page 10: Using Deep Learning And NLP To Predict Performance From Resumes

I want to work here have great PERF.1 1 1 1 1 0 0 95%1 0 0 0 0 1 1 22%0 0 0 0 0 0 0 57%1 0 0 0 0 1 0 91%0 0 0 0 0 0 0 11%

Bagofwordsmodeling,sequenceandorderingislost

BUILDING A MODEL

Page 11: Using Deep Learning And NLP To Predict Performance From Resumes

Bagofwordsmodeling,sequenceandorderingislost

BUILDING A MODEL

Page 12: Using Deep Learning And NLP To Predict Performance From Resumes

I want Want to to go work here PERF.

1 1 1 1 1 95%1 0 0 0 0 22%0 0 0 0 0 57%1 0 0 0 0 91%0 0 0 0 0 11%

Band-Aid:Conceptofn-grams

BUILDING A MODEL

Page 13: Using Deep Learning And NLP To Predict Performance From Resumes

SENTIMENT EXAMPLE(multiclass)

Page 14: Using Deep Learning And NLP To Predict Performance From Resumes

Weneedalabeleddataset,sometimesgettingonewithlabelsisthebiggestchallengeofall.

SENTIMENT DATASET, 1.5M TWEETS

label textneg @Christian_Rocha i miss u!!!!!pos @llanitos there's still some St Werburghs hone...pos @Ashley96 it's meneg @Phillykidd we use to be like bestfriends

negJust got back from Manchester. I went to the T...

pos @LauraDark thnks x el rt

neg"Ughh it's so hot & the singing lady is st...

neg@hnprashanth @dkris I was out to my native for...

pos Girls night with the bests Wish you were here J!

negJust watched @paulkehler rock the crap out of ...

pos i got the gurl! i got the ride! now im just on...pos @ninthspace how is the table building going?pos by d way guyz I must log out na see u again to...neg @dreday11 its only 20 mins...

Sentiment140 cs.stanford.edu:( :)

Page 15: Using Deep Learning And NLP To Predict Performance From Resumes

Beforewecanprocessthisweneedtodotheproperformattingtogetitready

SENTIMENT DATASET - FORMATTING

text@Christian_Rocha i miss u!!!!!@llanitos there's still some St Werburghs hone...@Ashley96 it's me@Phillykidd we use to be like bestfriendsJust got back from Manchester. I went to the T...@LauraDark thnks x el rt"Ughh it's so hot & the singing lady is st...@hnprashanth @dkris I was out to my native for...Girls night with the bests Wish you were here J!Just watched @paulkehler rock the crap out of ...i got the gurl! i got the ride! now im just on...@ninthspace how is the table building going?by d way guyz I must log out na see u again to...@dreday11 its only 20 mins...

Pythonlist

Page 16: Using Deep Learning And NLP To Predict Performance From Resumes

Nowwecangoallthewaytomodeltrainingandprediction

SENTIMENT DATASET – UNIGRAM

y[0,1,0,1,1]

text_data[[‘thisisatweet’][‘soundsgood’][‘notreally’]]

I want to work here have great1 1 1 1 1 0 01 0 0 0 0 1 10 0 0 0 0 0 01 0 0 0 0 1 00 0 0 0 0 0 0

Page 17: Using Deep Learning And NLP To Predict Performance From Resumes

Nowwecangoallthewaytomodeltrainingandprediction

SENTIMENT DATASET – BIGRAM

I want Want to to go work here

1 1 1 1 11 0 0 0 00 0 0 0 01 0 0 0 00 0 0 0 0

text_data[[‘thisisatweet’][‘soundsgood’][‘notreally’]]

y[0,1,0,1,1]

Page 18: Using Deep Learning And NLP To Predict Performance From Resumes

BUILDING A MODEL

Page 19: Using Deep Learning And NLP To Predict Performance From Resumes

Convertlabelstointegers

SENTIMENT DATASET - FORMATTING

Pythonintarray

labelnegposposnegnegposnegnegposnegposposposneg

Page 20: Using Deep Learning And NLP To Predict Performance From Resumes

Convertlabelstointegers

SENTIMENT DATASET - FORMATTING

model.fit(X,Y)

X[4,0,0,0,0,7,0,0,1][0,0,0,0,9,0,0,0,2]

Page 21: Using Deep Learning And NLP To Predict Performance From Resumes

Nowwecangoallthewaytomodeltrainingandprediction

SENTIMENT DATASET – BUILD A MODEL

y[0,1,0,1,1]

X[4,0,0,0,0,7,0,0,1][0,0,0,0,9,0,0,0,2]

PERFORMANCE?

Page 22: Using Deep Learning And NLP To Predict Performance From Resumes

DON’T CHEAT!

Page 23: Using Deep Learning And NLP To Predict Performance From Resumes

PROPER MODEL VALIDATION

Page 24: Using Deep Learning And NLP To Predict Performance From Resumes

Weneedtoholdoutdatawecantestagainst,thisiscalledyourvalidationset

SENTIMENT DATASET – VALIDATION

Page 25: Using Deep Learning And NLP To Predict Performance From Resumes

Trainon20%,teston80%

SENTIMENT DATASET – VALIDATION

20% 80%

Page 26: Using Deep Learning And NLP To Predict Performance From Resumes

Bestscoreyet

SENTIMENT DATASET – VALIDATION

60% 40%

Page 27: Using Deep Learning And NLP To Predict Performance From Resumes

Bestscoreyet

SENTIMENT DATASET – VALIDATION

70% 30%

Page 28: Using Deep Learning And NLP To Predict Performance From Resumes

Bestscoreyet

SENTIMENT DATASET – VALIDATION

80% 20%

Page 29: Using Deep Learning And NLP To Predict Performance From Resumes

Bestscoreyet

SENTIMENT DATASET – VALIDATION

99% 1%

Page 30: Using Deep Learning And NLP To Predict Performance From Resumes

Perfectscores

SENTIMENT DATASET – VALIDATION

99.9999% 2

Page 31: Using Deep Learning And NLP To Predict Performance From Resumes

Predict Every Point, k-foldingFolds = 9 Fold = 1 Fold = 2… Y_pred

Page 32: Using Deep Learning And NLP To Predict Performance From Resumes

SENTIMENT DATASET – Validation

10 folds

Page 33: Using Deep Learning And NLP To Predict Performance From Resumes

SENTIMENT DATASET – Validation

100 folds

Page 34: Using Deep Learning And NLP To Predict Performance From Resumes

BIGRAM BOOST

acc: 0.8015r: 0.2061AUROC: 0.8738

acc: 0.7809r: 0.1238AUROC: 0.8554

Page 35: Using Deep Learning And NLP To Predict Performance From Resumes

Feature Creation

Model Selection

Feature Reduction

Page 36: Using Deep Learning And NLP To Predict Performance From Resumes

BETTER MODELS

acc: 0.8208r: 0.2832AUROC: 0.8939

acc: 0.8015r: 0.2061AUROC: 0.8739

Was:

Now: (10x average)

Page 37: Using Deep Learning And NLP To Predict Performance From Resumes

EMAIL CLASSIFICATION(multiclass)

Page 38: Using Deep Learning And NLP To Predict Performance From Resumes

EMAIL MULTICLASS DATASET (20 classes)

alt.atheismcomp.graphicscomp.os.ms-windows.misccomp.sys.ibm.pc.hardwarecomp.sys.mac.hardwarecomp.windows.xmisc.forsalerec.autosrec.motorcyclesrec.sport.baseballrec.sport.hockey

sci.cryptsci.electronicssci.medsci.spacesoc.religion.christiantalk.politics.gunstalk.politics.mideasttalk.politics.misctalk.religion.misc

Page 39: Using Deep Learning And NLP To Predict Performance From Resumes

EMAIL MULTICLASS DATASET (20 classes)

From: [email protected](where'smything)Subject: WHATcaristhis!?Nntp-Posting-Host: rac3.wam.umd.eduOrganization: UniversityofMaryland,CollegeParkLines: 15MSG: I was wondering if anyone out there could enlighten me on this car I saw\nthe other day. It was a 2-door sports car, looked to be from the late 60s/\nearly 70s. It was called a Bricklin. The doors were really small. In addition,\nthe front bumper was separate from the rest of the body. This is \nall I know. If anyone can tellme a model name, engine specs, years\nof production, where this car is made, history, or whatever info you\nhave on this funky looking car, please e-mail.\n\nThanks,\n- IL\n ---- brought to you by your neighborhood Lerxst ----\n\n\n\n\n"

rec.autos

Page 40: Using Deep Learning And NLP To Predict Performance From Resumes

EMAIL MULTICLASS DATASET (20 classes)

From: [email protected](GuyKuo)Subject: SIClockPoll-FinalCallSummary: FinalcallforSIclockreportsKeywords: SI,acceleration,clock,upgradeArticle-I.D.: shelley.1qvfo9INNc3sOrganization: UniversityofWashingtonLines: 11NNTP-Posting-Host: carson.u.washington.eduMSG: AfairnumberofbravesoulswhoupgradedtheirSIclockoscillatorhave\nsharedtheirexperiencesforthispoll.Pleasesendabriefmessagedetailing\nyourexperienceswiththeprocedure.Topspeedattained,CPUratedspeed,\naddoncardsandadapters,heatsinks,hourofusageperday,floppydisk\nfunctionalitywith800and1.4mfloppiesareespeciallyrequested.\n\nIwillbesummarizinginthenexttwodays,sopleaseaddtothenetwork\nknowledgebaseifyouhavedonetheclockupgradeandhaven'tansweredthis\npoll.Thanks.\n\nGuyKuo<[email protected]>\n"

comp.sys.mac.hardware

Page 41: Using Deep Learning And NLP To Predict Performance From Resumes

EMAIL MULTICLASS DATASET (20 classes)

From: jgreen@amber(JoeGreen)Subject: Re:WeitekP9000?Organization: HarrisComputerSystemsDivisionLines: 14Distribution: worldNNTP-Posting-Host: amber.ssd.csd.harris.comX-Newsreader: TIN[version1.1PL9]MSG: RobertJ.C.Kyanko([email protected])wrote:\n>[email protected]<[email protected]>:\n>>AnyoneknowabouttheWeitekP9000graphicschip?\n>Asfarasthelow-levelstuffgoes,itlooksprettynice.It\'sgotthis\n>quadrilateralfillcommandthatrequiresjustthefourpoints.\n\nDoyouhaveWeitek\'saddress/phonenumber?I\'dliketogetsomeinformation\naboutthischip.\n\n--\nJoeGreen\t\t\t\tHarrisCorporation\[email protected]\t\t\tComputerSystemsDivision\n"Theonlythingthatreallyscaresmeisapersonwithnosenseofhumor."\n\t\t\t\t\t\t--JonathanWinters\n’

comp.graphics

Page 42: Using Deep Learning And NLP To Predict Performance From Resumes

EMAIL MULTICLASS DATASET (20 classes)

Page 43: Using Deep Learning And NLP To Predict Performance From Resumes

RESUME MODELING

(binary)

Page 44: Using Deep Learning And NLP To Predict Performance From Resumes

Upload Your Resume

Now painstakingly fill out this form containing all of the exact same information

Page 45: Using Deep Learning And NLP To Predict Performance From Resumes

Document modeling review

UNSTRUCTURED

STRUCTURED

MUNGED

Page 46: Using Deep Learning And NLP To Predict Performance From Resumes

Resume Extension

Page 47: Using Deep Learning And NLP To Predict Performance From Resumes

Resume format consolidation

Page 48: Using Deep Learning And NLP To Predict Performance From Resumes

GPA Inclusion (18%)

Page 49: Using Deep Learning And NLP To Predict Performance From Resumes

GPA Replacement

Page 50: Using Deep Learning And NLP To Predict Performance From Resumes

Mimicking the human recruiterFeature Hunt

ONEFEATUREATATIME

INCREMENTAL GAINS

Page 51: Using Deep Learning And NLP To Predict Performance From Resumes

DEEP LEARNING

Page 52: Using Deep Learning And NLP To Predict Performance From Resumes

UnstructuredENGINEERSANDMANUALFEATURESAREEXPENSIVE,USINGDEEPLEARNINGTOAUTOMATE

AUTOMATIC FEATURE GENERATION

StructuredI want Want

to to go work here PERF.

1 1 1 1 1 95%1 0 0 0 0 22%0 0 0 0 0 57%1 0 0 0 0 91%0 0 0 0 0 11%

ESSAY

I want to work here

I have great teamwork

Synergy

I have so much gritThey fired that

individual

Page 53: Using Deep Learning And NLP To Predict Performance From Resumes

ENGINEERSANDMANUALFEATURESAREEXPENSIVE,USINGDEEPLEARNINGTOAUTOMATE

AUTOMATIC FEATURE GENERATION

ESSAY

I want to work here

I have great teamwork

Synergy

I have so much gritThey fired that

individual

ESSAY

3 2 1 4 5

3 7 67 345

54

3 7 99 10234

78 203 501 14

1 2 3 4 50 0 0 1 01 0 0 0 00 1 0 0 00 0 1 0 0

LSTM

RAWTEXT WORDSEQUENCE

ENCODING

Page 54: Using Deep Learning And NLP To Predict Performance From Resumes

AUTOMATIC FEATURE GENERATION

Page 55: Using Deep Learning And NLP To Predict Performance From Resumes

AUTOMATIC FEATURE GENERATION

Page 56: Using Deep Learning And NLP To Predict Performance From Resumes

AUTOMATIC FEATURE GENERATION

Page 57: Using Deep Learning And NLP To Predict Performance From Resumes

BEGINSCRATCHINGATLAYOUT

AUTOMATIC FEATURE GENERATION (LAYOUT)

CNN:bit.ly/pacon

Page 58: Using Deep Learning And NLP To Predict Performance From Resumes

INTERVIEW MODELING

Page 59: Using Deep Learning And NLP To Predict Performance From Resumes

59

WOULDYOUEVERHIREFROM JUST ARESUME?

INTERVIEW MODELINGSOFT/TECHNICAL COMPETENCIESResumecanoverstateandunderstate

Page 60: Using Deep Learning And NLP To Predict Performance From Resumes

Audio VideoText

Page 61: Using Deep Learning And NLP To Predict Performance From Resumes

QUESTIONS