eduworks-network.eufacebook.com/eduworksnetwork
@EduworksNetwork
This project has been funded with support from the European Commission.This communication reflects the views only of the author, and the Commission cannot be held responsible for any use which may be
made of the information contained therein.
Job AnalyticsVladimer Kobayashi ([email protected]), Stefan
Mol, Gábor Kismihók, and Deanne den HartogAmsterdam Business School / EDUWORKS
Presented during Amsterdam Data Science Meet-up
Overview• Hi my name is…• Talent Management Process• Job Analysis• The dawn of Job Analytics• Framework• Vacancy Mining• Data Sources• Applications
• Sample Applications
Eduworks Project 2
About me• PhD student at the University of Amsterdam (3 months to go)• PhD topic: Labour Market driven Learning Analytics
Eduworks Project 3
Talent management process
Decide what positions to
fill
Recruiting
Candidates complete
application forms and
initial screening
Use selection
tools
Decide to whom to make an
offer
Orient, train, and develop
employees
Appraise employees
Reward and compensate employees
Eduworks Project 4
Job Analysis• The procedure through which you determine the duties of the
positions and the characteristics of the people to hire for them.
• Job analysis is the starting point of many HR practices
Eduworks Project 5
Job AnalysisJob Analysis
Job Description and Job
Specification
Training requirements
Job evaluation (wage and
salary decisions)
Performance Appraisal
Recruiting and selection decisions
Eduworks Project 6
Traditional Methods for Collecting Job Information• Interview
• Each employee or group of employees• Questionnaires• Observation
• Direct observation• Diary/Logs
• Drawbacks• Time consuming and sometimes expensive• Collecting information from geographically dispersed employees can be
challenging• Keeping information up-to-date is a challenge
Eduworks Project 7
Along came Job Analytics• Subfield of HR analytics• Use of analytics to supplement traditional methods for collecting
and analysing data for job analysis
• Factors affecting jobs (economic, environmental, technological, immigration, workforce, etc.)
• Applications
• Metrics (ROI, cost-benefit, and impact analysis)
The recently concluded EDUWORKS (http://www.eduworks-network.eu/) project is currently at the forefront of this field
Eduworks Project 8
Job analytics: Vacancy mining and CV mining• Vacancy Mining
• Automatic extraction of job information from vacancies (done as early as 1997)
• Mostly through keyword extraction (i.e. someone has to supply the keywords)
• CV mining• Automatic extraction of job analysis information from CVs
Eduworks Project 9
Eduworks Project 10
Vacancy miningUseful for….Job AnalysisInsight into employment opportunitiesCurrent skill requirementsSkill changes and trends (for certain jobs)
Evaluating and renewing teaching and informing curriculum development
Eduworks Project 11
Vacancy mining
Eduworks Project 12
What are in a vacancy*?
• Worker-oriented domain1. Worker characteristics2. Worker requirements3. Experience requirements
Worker oriented
Job oriented domain
• Job-oriented domain1. Occupational requirements2. Workforce characteristics3. Occupation-specific information
*Based on O*NET Content Model Eduworks Project 13
Approach 1
Sentences Feature matrix
Preprocessing and Segmentation
Feature Extraction
Classification model
Random Forest, SVM, and Naive Bayes
Hard to classify
sentences Query by committee
Newly expert labelled
sentences
Retrain
Vacancies(labelled)
Expert
Classified sentencesValidation
Eduworks Project 14
Preprocessing• Punctuation removal• Lower case• Sentence segmentation
• Stopword removal
• We do not remove these stopwords“to", "have", "has", "had", "must","can", "could", "may","might", "shall","should","will", and "would"
Eduworks Project 15
Feature Type Number of derived features
Variable Type
Part of speech (POS) tag of the first word 1
Categorical (actual POS)
Is the first word in this sentence unique in work activity sentences (based from the labelled data)
1
Numeric
Is the first word in this sentence unique in worker attribute sentences (based from the labelled data)
1
Numeric
Is the last word in this sentence unique in work activity sentences (based from the labelled data)
1
Numeric
Is the last in this sentence unique in worker attribute sentences (based from the labelled data) 1
Numeric
Eduworks Project 16
Feature TypeNumber of derived features
Variable Type
Proportion of adjectives 1 Numeric
Proportion of verbs 1 Numeric
Proportion of word “to” 1 Numeric
Proportion of modal verbs1
Numeric
Proportion of numbers 1 Numeric
Proportion of adverbs 1 Numeric
Eduworks Project 17
Feature Type Number of derived features Variable TypeProportion of nouns 1 NumericProportion of nouns, verbs, adjectives, adverbs, and other part of speech tags followed by another verb
5
Proportion of unique words found only in work activity sentences (based from the labelled data)
1
Numeric
Proportion of unique words found only in worker attributes sentences (based from the labelled data)
1
Numeric
Frequency of keywords for work activity and worker attributes sentences
149
Numeric
Eduworks Project 18
Approach 2 (Kim, 2014)• Embed words in multiple dimensions
• train CNN (on top of word2vec) to classify sentences
Eduworks Project 19
Word2vec (skip-gram model)
Image credit: McCormick (2016)Eduworks Project 20
Word Cosine similarity
interpersonal 0.90verbal 0.90skills 0.88written 0.85strong 0.84excellent 0.83good 0.83communicator 0.81ability 0.80organisational 0.80
Words similar to communication
Eduworks Project 21
Word Cosine similarity
overseas 0.68travelling 0.66occasionally 0.64international 0.64locations 0.64europe 0.62abroad 0.54car 0.52flexible 0.52visits 0.50essential 0.50
Words similar to travel
Eduworks Project 22
CNN
Image credit: Yoon (2014) Eduworks Project 23
CPU vs GPU
type Run time Training Accuracy (10-fold cross validation)
Test accuracy
CPU i7-4790 CPU @3.60GHz
30 days 99.990% 99.807%
GPU NVIDIA GeForceGTX 745
~4.5 days 99.992% 99.782%
Embedding dimension: 300Convolution Filters: 3, 4, 5Batch size: 50Drop out probability: 0.50Number of Epochs: 50
Power? Energy? Eduworks Project 24
Validation• Cross-validation
• Compare with independent expert• Compare with task inventory
Eduworks Project 25
Demo
https://vkobayashi.shinyapps.io/labelme/
Just Copy-paste a vacancy (any language)
Eduworks Project 26
Key worker attributes from topic modeling (number of topics)
Cao Juan, Xia Tian, Li Jintao, Zhang Yongdong, and Tang Sheng. 2009. A density-based method foradaptive lDA model selection. Neurocomputing — 16th European Symposium on Artificial NeuralNetworks 2008 72, 7–9: 1775–1781. http://doi.org/10.1016/j.neucom.2008.06.011Eduworks Project 27
Topic 100developmentsoftwareagilemethodologiesapplicationscrumdesignlife
Topic 86newlearnquicklywillingnessadapttechnologiesinternetdesire
Topic 132travelwillingnesswillingworktimeneededinternationallyinternational
Topic 20salessellingsalesforcecomoutsidecrmsuccessaccountinside
Topic 75communicationwrittenoralverbalinterpersonalpresentationeffectivelistening
Topic 18highlymotivatedorientedselfdrivenorganizedstarterselfstarter
Key worker attributes from topic modeling (Topics)
Eduworks Project 28
Topic 61licensevaliddriversdrivingrecordtransportationreliablevehicle
Topic 16dataanalysisquantitativeresearchstatisticseconomicsstatisticalmodeling
Topic 60scriptingpythonlinuxprogrammingjavaperllanguagesunix
Kobayashi, V. B., Berkers, H. A., Mol, S. T., Kismihok, G., & Den Hartog, D. N. (2017). Text Mining in Organizational Research. Organizational Research Methods.
Eduworks Project 29
Job clusters according to worker attributes
Job Cluster 98Job Cluster 86
Eduworks Project 30
Applications• Task Analysis (with Expert Validation) done in Collaboration with the Pro-Nursing Project
http://www.pro-nursing.eu/web/ Eduworks Project 31
Applications• Hybrid Teachers in Collaboration with Hybride Docent and de baaninggenieurs
https://vkobayashi.shinyapps.io/analysis_en_vis/
Eduworks Project 32
Other Applications• Job Test Validation
Eduworks Project 33
CV Mining – a glimpse (with USG People and Endouble)• Given education what is the most likely career progression• Use education and employment history to predict the next job
Eduworks Project 34
• Education"atheneum" "conservatorium saxofoon dmlichtemuziek" "cursus logic pro 9 professionelemuziekproductie" "cursus nlp neurolinguïstischprogrammeren practitioner""cursus tot beoordelend" "cursus tot claimbehandelaar voor de ziektewet"
• Employment"veel" "productie en fabriekswerk" "taxichauffeur""telefonist""claimbehandelaar afdeling ziektewet""medewerker" "beslisser claim afdeling werkloosheidswet“"saxofoondocent""saxofoondocent""postkamermedewerker"
Example: Candidate id number 135994
Eduworks Project 35
Challenge• How to deal with huge numbers of job titles and education title?
• Deal with function class instead of job titles• 22 function classes
• Defensie, Vervoer en transport, Horeca, Productie, etc.
Eduworks Project 36
1. assistentaccountant2. assistent accountant 3. beginnend assistent
accountant 4. assistenteaccountant5. accountant assistent6. officier assistent accountant 7. assistent accountant inboeken8. account assistent9. accountassistente
10.assistent accounting 11.1e assistent accountant12.accounting assistent13.assistent accountantc14. assistent accountancy15.assistent accountant ai16.assistent accountant ab 17.ervaren assistent
accountant
Results: Example cluster
Eduworks Project 37
1] "schoonmaker" [2] "schoonmaker ouderlijk bedrijf" [3] "schoonmaker vnl" [4] "schoonmaker spoelkeuken" [5] "schoonmaker horecabedrijven" [6] "schoonmaken" [7] "schoonmaker service" [8] "schoonmaker scholen" [9] "schoonmaker keukenhulp" [10] "schoonmaker produktie" [11] "schoonmaker bezorger" [12] "vegen schoonmaken" [13] "schoonmaker droge schone iss" [14] "schoonmaker schoonmaken" [15] "ok schoonmaker" [16] "masseur schoonmaker" [17] "schoonmaak fabriek schoonmaken"[18] "schoonmaker nederland" [19] "afwasserschoonmaken" [20] "schoonmaker fa" [21] "schoonmaker horeca" [22] "keukenhulp schoonmaak" [23] "hectas schoonmaken"
[1] "directiesecretaresseteamleidercommerciële administratie"[2] "directiesecretaresse administratief" [3] "directie secretaresse administratief" [4] "directiesecretaresse loonadministratrice" [5] "stagiaire directie secretaresse en administratief" [6] "administratie secretaresse" [7] "stagiaire secretarieel administratief" [8] "directiesecretaressehoofd administratie" [9] "administratiefmedewerkerdirectiesecretaresse" [10] "administratie secretaresse directie secretaresse" [11] "administratiefdirectiesecretariaat" [12] "administratiefdirectiesecretaresse" [13] "administratie directie secretaresse"
Eduworks Project 38
[2] "coordinator sponsoring" [3] "facilities coordinator" [4] "campagne coordinator" [5] "order coordinator" [8] "management coordinator" [9] "response coordinator" [10] "banquet sales coordinator" [11] "teamleider coordinator" [12] "coordinator new business team" [13] "coordinator" [15] "steward team coordinator" [17] "sales coordinator export" [18] "sales coordinator" [19] "team coordinator surgical" [20] "procurement coordinator export" [21] "freelance buscoordinator" [22] "event coordinator" [23] "coordinator emplooi" [24] "allio coordinator tso" [25] "program coordinator" [26] "programme coordinator" [27] "events coordinator" [28] "event and sponsorship coordinator"
[1] "kapster mede eigenaresse" [2] "eigenaresse" [3] "mede eigenaresse" [4] "eigenaresse winkel" [5] "eigenaresse kledingwinkel" [6] "medeeigenares" [7] "eigenaresse laifde" [8] "eigenaresse van slagerij en delicatessenwinkel"[9] "medeeigenaresse" [10] "eigenaresse kinderboetiek"
Eduworks Project 39
Towards a good matching (with Endouble and USG People)
Job characteristics• Role
requirements
Individual characteristics• Education• Previous job
experience• Gender, age• other
Eduworks Project 40
References• Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. arXiv:1408.5882
[Cs]. Retrieved from http://arxiv.org/abs/1408.5882• Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word
representations in vector space. arXiv Preprint arXiv:1301.3781. Retrieved from http://arxiv.org/abs/1301.3781
• McCormick, C. (2016, April 19). Word2Vec Tutorial - The Skip-Gram Model. Retrieved from http://www.mccormickml.com
• https://mxnet.incubator.apache.org/tutorials/nlp/cnn.html• Kobayashi, V. B., Berkers, H. A., Mol, S. T., Kismihok, G., & Den Hartog, D. N. (2017). Text
Mining in Organizational Research. Organizational Research Methods. Manuscript in Preparation.
• Kobayashi, V. B., Mol, S. T., Berkers, H. A., Kismihok, G., & Den Hartog, D. N. (2017). TextClassification for Organizational Research: A Tutorial. Organizational Research Methods.
• Kobayashi, V., Mol, S. T., Kismihok, G., & Hesterberg M. (2017). Automatic Extraction of Nursing Tasks from Online Job Vacancies In M. Fathi, M. Khobreh, & F. Ansari (Eds), Professional Education and Training through Knowledge, Technology and Innovation (pp. 51-56). Siegen, Germany: Universitatsverlag Siegen.
http://www.eduworks-network.eu/Eduworks Project 41
This work was supported by the European Commission through the Marie-Curie Initial Training Network EDUWORKS (grant number PITN-GA-2013-608311)
We are forever grateful to…
Eduworks Project 42
Top Related