Text mining applications in job analysisJob Analytics Vladimer Kobayashi ([email protected]),...

42
eduworks-network.eu facebook.com/eduworksnetwork @EduworksNetwork This project has been funded with support from the European Commission. This communication reflects the views only of the author, and the Commission cannot be held responsible for any use which may be made of the information contained therein. Job Analytics Vladimer Kobayashi ([email protected]), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS Presented during Amsterdam Data Science Meet-up

Transcript of Text mining applications in job analysisJob Analytics Vladimer Kobayashi ([email protected]),...

Page 1: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

eduworks-network.eufacebook.com/eduworksnetwork

@EduworksNetwork

This project has been funded with support from the European Commission.This communication reflects the views only of the author, and the Commission cannot be held responsible for any use which may be

made of the information contained therein.

Job AnalyticsVladimer Kobayashi ([email protected]), Stefan

Mol, Gábor Kismihók, and Deanne den HartogAmsterdam Business School / EDUWORKS

Presented during Amsterdam Data Science Meet-up

Page 2: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Overview• Hi my name is…• Talent Management Process• Job Analysis• The dawn of Job Analytics• Framework• Vacancy Mining• Data Sources• Applications

• Sample Applications

Eduworks Project 2

Page 3: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

About me• PhD student at the University of Amsterdam (3 months to go)• PhD topic: Labour Market driven Learning Analytics

Eduworks Project 3

Page 4: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Talent management process

Decide what positions to

fill

Recruiting

Candidates complete

application forms and

initial screening

Use selection

tools

Decide to whom to make an

offer

Orient, train, and develop

employees

Appraise employees

Reward and compensate employees

Eduworks Project 4

Page 5: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Job Analysis• The procedure through which you determine the duties of the

positions and the characteristics of the people to hire for them.

• Job analysis is the starting point of many HR practices

Eduworks Project 5

Page 6: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Job AnalysisJob Analysis

Job Description and Job

Specification

Training requirements

Job evaluation (wage and

salary decisions)

Performance Appraisal

Recruiting and selection decisions

Eduworks Project 6

Page 7: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Traditional Methods for Collecting Job Information• Interview

• Each employee or group of employees• Questionnaires• Observation

• Direct observation• Diary/Logs

• Drawbacks• Time consuming and sometimes expensive• Collecting information from geographically dispersed employees can be

challenging• Keeping information up-to-date is a challenge

Eduworks Project 7

Page 8: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Along came Job Analytics• Subfield of HR analytics• Use of analytics to supplement traditional methods for collecting

and analysing data for job analysis

• Factors affecting jobs (economic, environmental, technological, immigration, workforce, etc.)

• Applications

• Metrics (ROI, cost-benefit, and impact analysis)

The recently concluded EDUWORKS (http://www.eduworks-network.eu/) project is currently at the forefront of this field

Eduworks Project 8

Page 9: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Job analytics: Vacancy mining and CV mining• Vacancy Mining

• Automatic extraction of job information from vacancies (done as early as 1997)

• Mostly through keyword extraction (i.e. someone has to supply the keywords)

• CV mining• Automatic extraction of job analysis information from CVs

Eduworks Project 9

Page 10: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Eduworks Project 10

Page 11: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Vacancy miningUseful for….Job AnalysisInsight into employment opportunitiesCurrent skill requirementsSkill changes and trends (for certain jobs)

Evaluating and renewing teaching and informing curriculum development

Eduworks Project 11

Page 12: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Vacancy mining

Eduworks Project 12

Page 13: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

What are in a vacancy*?

• Worker-oriented domain1. Worker characteristics2. Worker requirements3. Experience requirements

Worker oriented

Job oriented domain

• Job-oriented domain1. Occupational requirements2. Workforce characteristics3. Occupation-specific information

*Based on O*NET Content Model Eduworks Project 13

Page 14: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Approach 1

Sentences Feature matrix

Preprocessing and Segmentation

Feature Extraction

Classification model

Random Forest, SVM, and Naive Bayes

Hard to classify

sentences Query by committee

Newly expert labelled

sentences

Retrain

Vacancies(labelled)

Expert

Classified sentencesValidation

Eduworks Project 14

Page 15: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Preprocessing• Punctuation removal• Lower case• Sentence segmentation

• Stopword removal

• We do not remove these stopwords“to", "have", "has", "had", "must","can", "could", "may","might", "shall","should","will", and "would"

Eduworks Project 15

Page 16: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Feature Type Number of derived features

Variable Type

Part of speech (POS) tag of the first word 1

Categorical (actual POS)

Is the first word in this sentence unique in work activity sentences (based from the labelled data)

1

Numeric

Is the first word in this sentence unique in worker attribute sentences (based from the labelled data)

1

Numeric

Is the last word in this sentence unique in work activity sentences (based from the labelled data)

1

Numeric

Is the last in this sentence unique in worker attribute sentences (based from the labelled data) 1

Numeric

Eduworks Project 16

Page 17: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Feature TypeNumber of derived features

Variable Type

Proportion of adjectives 1 Numeric

Proportion of verbs 1 Numeric

Proportion of word “to” 1 Numeric

Proportion of modal verbs1

Numeric

Proportion of numbers 1 Numeric

Proportion of adverbs 1 Numeric

Eduworks Project 17

Page 18: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Feature Type Number of derived features Variable TypeProportion of nouns 1 NumericProportion of nouns, verbs, adjectives, adverbs, and other part of speech tags followed by another verb

5

Proportion of unique words found only in work activity sentences (based from the labelled data)

1

Numeric

Proportion of unique words found only in worker attributes sentences (based from the labelled data)

1

Numeric

Frequency of keywords for work activity and worker attributes sentences

149

Numeric

Eduworks Project 18

Page 19: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Approach 2 (Kim, 2014)• Embed words in multiple dimensions

• train CNN (on top of word2vec) to classify sentences

Eduworks Project 19

Page 20: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Word2vec (skip-gram model)

Image credit: McCormick (2016)Eduworks Project 20

Page 21: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Word Cosine similarity

interpersonal 0.90verbal 0.90skills 0.88written 0.85strong 0.84excellent 0.83good 0.83communicator 0.81ability 0.80organisational 0.80

Words similar to communication

Eduworks Project 21

Page 22: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Word Cosine similarity

overseas 0.68travelling 0.66occasionally 0.64international 0.64locations 0.64europe 0.62abroad 0.54car 0.52flexible 0.52visits 0.50essential 0.50

Words similar to travel

Eduworks Project 22

Page 23: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

CNN

Image credit: Yoon (2014) Eduworks Project 23

Page 24: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

CPU vs GPU

type Run time Training Accuracy (10-fold cross validation)

Test accuracy

CPU i7-4790 CPU @3.60GHz

30 days 99.990% 99.807%

GPU NVIDIA GeForceGTX 745

~4.5 days 99.992% 99.782%

Embedding dimension: 300Convolution Filters: 3, 4, 5Batch size: 50Drop out probability: 0.50Number of Epochs: 50

Power? Energy? Eduworks Project 24

Page 25: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Validation• Cross-validation

• Compare with independent expert• Compare with task inventory

Eduworks Project 25

Page 26: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Demo

https://vkobayashi.shinyapps.io/labelme/

Just Copy-paste a vacancy (any language)

Eduworks Project 26

Page 27: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Key worker attributes from topic modeling (number of topics)

Cao Juan, Xia Tian, Li Jintao, Zhang Yongdong, and Tang Sheng. 2009. A density-based method foradaptive lDA model selection. Neurocomputing — 16th European Symposium on Artificial NeuralNetworks 2008 72, 7–9: 1775–1781. http://doi.org/10.1016/j.neucom.2008.06.011Eduworks Project 27

Page 28: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Topic 100developmentsoftwareagilemethodologiesapplicationscrumdesignlife

Topic 86newlearnquicklywillingnessadapttechnologiesinternetdesire

Topic 132travelwillingnesswillingworktimeneededinternationallyinternational

Topic 20salessellingsalesforcecomoutsidecrmsuccessaccountinside

Topic 75communicationwrittenoralverbalinterpersonalpresentationeffectivelistening

Topic 18highlymotivatedorientedselfdrivenorganizedstarterselfstarter

Key worker attributes from topic modeling (Topics)

Eduworks Project 28

Page 29: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Topic 61licensevaliddriversdrivingrecordtransportationreliablevehicle

Topic 16dataanalysisquantitativeresearchstatisticseconomicsstatisticalmodeling

Topic 60scriptingpythonlinuxprogrammingjavaperllanguagesunix

Kobayashi, V. B., Berkers, H. A., Mol, S. T., Kismihok, G., & Den Hartog, D. N. (2017). Text Mining in Organizational Research. Organizational Research Methods.

Eduworks Project 29

Page 30: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Job clusters according to worker attributes

Job Cluster 98Job Cluster 86

Eduworks Project 30

Page 31: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Applications• Task Analysis (with Expert Validation) done in Collaboration with the Pro-Nursing Project

http://www.pro-nursing.eu/web/ Eduworks Project 31

Page 32: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Applications• Hybrid Teachers in Collaboration with Hybride Docent and de baaninggenieurs

https://vkobayashi.shinyapps.io/analysis_en_vis/

Eduworks Project 32

Page 33: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Other Applications• Job Test Validation

Eduworks Project 33

Page 34: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

CV Mining – a glimpse (with USG People and Endouble)• Given education what is the most likely career progression• Use education and employment history to predict the next job

Eduworks Project 34

Page 35: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

• Education"atheneum" "conservatorium saxofoon dmlichtemuziek" "cursus logic pro 9 professionelemuziekproductie" "cursus nlp neurolinguïstischprogrammeren practitioner""cursus tot beoordelend" "cursus tot claimbehandelaar voor de ziektewet"

• Employment"veel" "productie en fabriekswerk" "taxichauffeur""telefonist""claimbehandelaar afdeling ziektewet""medewerker" "beslisser claim afdeling werkloosheidswet“"saxofoondocent""saxofoondocent""postkamermedewerker"

Example: Candidate id number 135994

Eduworks Project 35

Page 36: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Challenge• How to deal with huge numbers of job titles and education title?

• Deal with function class instead of job titles• 22 function classes

• Defensie, Vervoer en transport, Horeca, Productie, etc.

Eduworks Project 36

Page 37: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

1. assistentaccountant2. assistent accountant 3. beginnend assistent

accountant 4. assistenteaccountant5. accountant assistent6. officier assistent accountant 7. assistent accountant inboeken8. account assistent9. accountassistente

10.assistent accounting 11.1e assistent accountant12.accounting assistent13.assistent accountantc14. assistent accountancy15.assistent accountant ai16.assistent accountant ab 17.ervaren assistent

accountant

Results: Example cluster

Eduworks Project 37

Page 38: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

1] "schoonmaker" [2] "schoonmaker ouderlijk bedrijf" [3] "schoonmaker vnl" [4] "schoonmaker spoelkeuken" [5] "schoonmaker horecabedrijven" [6] "schoonmaken" [7] "schoonmaker service" [8] "schoonmaker scholen" [9] "schoonmaker keukenhulp" [10] "schoonmaker produktie" [11] "schoonmaker bezorger" [12] "vegen schoonmaken" [13] "schoonmaker droge schone iss" [14] "schoonmaker schoonmaken" [15] "ok schoonmaker" [16] "masseur schoonmaker" [17] "schoonmaak fabriek schoonmaken"[18] "schoonmaker nederland" [19] "afwasserschoonmaken" [20] "schoonmaker fa" [21] "schoonmaker horeca" [22] "keukenhulp schoonmaak" [23] "hectas schoonmaken"

[1] "directiesecretaresseteamleidercommerciële administratie"[2] "directiesecretaresse administratief" [3] "directie secretaresse administratief" [4] "directiesecretaresse loonadministratrice" [5] "stagiaire directie secretaresse en administratief" [6] "administratie secretaresse" [7] "stagiaire secretarieel administratief" [8] "directiesecretaressehoofd administratie" [9] "administratiefmedewerkerdirectiesecretaresse" [10] "administratie secretaresse directie secretaresse" [11] "administratiefdirectiesecretariaat" [12] "administratiefdirectiesecretaresse" [13] "administratie directie secretaresse"

Eduworks Project 38

Page 39: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

[2] "coordinator sponsoring" [3] "facilities coordinator" [4] "campagne coordinator" [5] "order coordinator" [8] "management coordinator" [9] "response coordinator" [10] "banquet sales coordinator" [11] "teamleider coordinator" [12] "coordinator new business team" [13] "coordinator" [15] "steward team coordinator" [17] "sales coordinator export" [18] "sales coordinator" [19] "team coordinator surgical" [20] "procurement coordinator export" [21] "freelance buscoordinator" [22] "event coordinator" [23] "coordinator emplooi" [24] "allio coordinator tso" [25] "program coordinator" [26] "programme coordinator" [27] "events coordinator" [28] "event and sponsorship coordinator"

[1] "kapster mede eigenaresse" [2] "eigenaresse" [3] "mede eigenaresse" [4] "eigenaresse winkel" [5] "eigenaresse kledingwinkel" [6] "medeeigenares" [7] "eigenaresse laifde" [8] "eigenaresse van slagerij en delicatessenwinkel"[9] "medeeigenaresse" [10] "eigenaresse kinderboetiek"

Eduworks Project 39

Page 40: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

Towards a good matching (with Endouble and USG People)

Job characteristics• Role

requirements

Individual characteristics• Education• Previous job

experience• Gender, age• other

Eduworks Project 40

Page 41: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

References• Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. arXiv:1408.5882

[Cs]. Retrieved from http://arxiv.org/abs/1408.5882• Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word

representations in vector space. arXiv Preprint arXiv:1301.3781. Retrieved from http://arxiv.org/abs/1301.3781

• McCormick, C. (2016, April 19). Word2Vec Tutorial - The Skip-Gram Model. Retrieved from http://www.mccormickml.com

• https://mxnet.incubator.apache.org/tutorials/nlp/cnn.html• Kobayashi, V. B., Berkers, H. A., Mol, S. T., Kismihok, G., & Den Hartog, D. N. (2017). Text

Mining in Organizational Research. Organizational Research Methods. Manuscript in Preparation.

• Kobayashi, V. B., Mol, S. T., Berkers, H. A., Kismihok, G., & Den Hartog, D. N. (2017). TextClassification for Organizational Research: A Tutorial. Organizational Research Methods.

• Kobayashi, V., Mol, S. T., Kismihok, G., & Hesterberg M. (2017). Automatic Extraction of Nursing Tasks from Online Job Vacancies In M. Fathi, M. Khobreh, & F. Ansari (Eds), Professional Education and Training through Knowledge, Technology and Innovation (pp. 51-56). Siegen, Germany: Universitatsverlag Siegen.

http://www.eduworks-network.eu/Eduworks Project 41

Page 42: Text mining applications in job analysisJob Analytics Vladimer Kobayashi (v.kobayashi@uva.nl), Stefan Mol, Gábor Kismihók, and Deanne den Hartog Amsterdam Business School / EDUWORKS

This work was supported by the European Commission through the Marie-Curie Initial Training Network EDUWORKS (grant number PITN-GA-2013-608311)

We are forever grateful to…

Eduworks Project 42