Crawling, P arsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology...

36
Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

description

Crawling, P arsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel , Textkernel InGRID Workshop 11-2-2014. Textkernel : Spinoff from R&D in machine learning and language technology - PowerPoint PPT Presentation

Transcript of Crawling, P arsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology...

Page 1: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Crawling, Parsing and Semantic Matching of Vacancies and CV’s

Semantic Recruitment Technology

Jakub Zavrel, TextkernelInGRID Workshop 11-2-2014

Page 2: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Textkernel: • Spinoff from R&D in machine learning and language

technology

• Founded 2001, offices in Amsterdam (HQ), Frankfurt, Paris, 45 employees; strong R&D focus

• Deloitte Fast 50 2007, 2010, 30% YoY growth

• Core technology: Understanding unstructured text data. Multi-lingual

Market:

• Job boards, Recruitment Software, Staffing and recruitment, Mobility, Large Employers

• Products:

• Multi-lingual tools (15 languages) to extract CVs and jobs

• Jobfeed: largest real time DB for job market analysis

• Search! & Match! to connect people and jobs

• Customers: UWV, Pole Emploi, Adecco, Randstad, USG, Monster, Stepstone, XING, SAP, Unisys, Bosch, Axa, Philips, etc. (350 direct, 2000+ indirect),

• Large partner network (HR & recruitment software)

Page 3: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

I like programming, but I’m interested do take on more project management responsibility

Is there a job in our organisation that better fits my degree?

I’d like to work on our mobile strategy. I’ve helped a friend develop a mobile app.

I’d like to do more with my organisational talent.

We are looking to hire:An experienced tech team team lead

Language gap

The ideal candidate has:- min. 5yr of experience- Certfied scrummaster- Exp. w/iOS, Android

Completed academic studies Computer Science or related

30% travel for customer presentations

Page 4: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

The Job ad searches directly in a database and identifies relevant candidates (or vice

versa) …

Page 5: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014
Page 6: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Automatically convert each document into a complete record

Extract! CV/Job Parsing

Page 7: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Extract!

Page 8: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Extract!

Page 9: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Extract!

Page 10: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Extract!

Page 11: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Extract! – Zero data entry job application

Page 12: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Extract!

Page 13: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

• Time savings coding CVs and Jobs• If you accept noise, 100% time savings• Structured data allows better search:

Semantic Searching and Matching• Coding enables reporting and statistics

Extract!

Page 14: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

• Coding follows Extraction• Customer specific or standard taxonomies• String similarity based normalization• Lot of synonyms per language• Distance = confidences • Problem cases: ambiguity, context, long tail• More complex models can help

(classifiers, multi-variate models)• Semantic matching better (occupation coding errors are

counterbalanced by other variables)

Occupation coding!

Page 15: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

• Semantic search:

„Lets you find what you mean not what you type“

Impression...

Search!

Page 16: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Match!

Match!

Page 17: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Semantic Matching Technology:

• Natural Language Processing

• Machine Learning

• Semantic Analysis

• Probabilistic Language Model

• Search Engine

• Multi-lingual taxonomies

• Recruitment knowledge-bases

Page 18: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Demo

Page 19: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Search and analyse real-time online job ads as well as historical

data

Jobfeed

Page 20: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Jobfeed

Page 21: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Jobfeed!

Knowledge of all demand for labour in European job market

– Sales leads for recruitment and staffing companies– Real time labour market analytics tools– Largest database of jobs for matching unemployed– Perfect data source for text mining

Page 22: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Jobfeed!• Real time collection of online job ads from any

(unstructured) source

• Available in NL, DE, FR, IT• Gradually rolling out in rest of Europe• Richly semantically structured data

Page 23: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Jobfeed!

Page 24: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Jobfeed: Multilingual Occupation Taxonomy

Occupations >4000 codes4 languages3 layer hierarchy

>50K synonyms

Link to other concepts:- Skills- Education level- Sector- O*NET- UWV (Dutch Employment Agency)- ROME

Based on millions of jobs, years of customer feedback and experience!

Example: NL: administratief medewerker, EN: administrative assistant, FR: employé administratif, DE: Verwaltungsassistent (m/w).

Group: administrative personnelClass: Administration and Customer ServiceSynonyms: administrative employee, assistant clerk, office support

Skills: ms office, excel, english language, etc

O*NET: 43-9199.00: Office and Administrative Support Workers, All OtherUWV: 1000402563: Administratief medewerker secretariaat

Page 25: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Demo

Page 26: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Jobfeed as material for Research

Page 27: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014
Page 28: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014
Page 29: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014
Page 30: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014
Page 31: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Frequent words for "Java developer"

envandeeenjemetinhetJavaof

Jeopisvoorteervaringaanalsandsoftware

omteamzijnkennisbijErvaringdiethenaara

jaarjijbentDeveloperHBOhebttowerken

werk

Page 32: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Frequent words for all professions

envandeeeninhetjemetopJe

voorteisofzijnaanbentnaarbijom

alservaringdieHethebtdezewerkenzoekDewij

functieonzebentotoverwerkopleidinguitandwerkzaamheden

datbinnenuAlsVoorzelfstandigkennisooksverantwoordelijk

Page 33: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Solution: contrast frequencies

• Observed frequency of w: • O(w) = A• Expected frequency of w: • E(w) = C * B / D• Pick words with highest

score:• score(w) = (O - E)2 / E

Java develo

per jobs

Alljobs

# jobs where

w occurs

A B

Total # jobs C D

Page 34: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Top words for "Java developer"

javadevelopersoftwarespringscrumagilehibernateontwikkelaaruj2ee

developmentmavenapplicatieservaringwebdeframeworksjbossmbosenior

wijxmljeeojavascriptyoukennisontwikkelenoracleontwikkeling

architectuurwebservicesinformaticawerkzaamhedentechnologiedeveloperseclipsebezithetteam

worijbewijstechniekentomcatthevcazelfstandigarchitectwerklocatiehtml

Building rich skills profiles for thousands of occupations from millions of real time jobs…

… new trends and occupations…

Page 35: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Supply & Demand

• Have: lots of data, technology, ideas

• Want: labor market expertise, students, research

Page 36: Crawling,  P arsing and Semantic  Matching  of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel ,  Textkernel InGRID  Workshop 11-2-2014

Semantic Recruitment Technology

Thanks!