Big Data and Labour Market
www.ceps.eu
Making use of big data to enrich labour market intelligence
Miroslav Beblavý, Senior Research Fellow, Head of Jobs & Skills Unit
CEPS and its Jobs & Skills Unit - CEPS: a major European think tank covering a whole range
of topics relevant to Europe and European integration
- Based in Brussels
- Our work covers a range of topics:
• Labour market policies and institutions
• Labour mobility and migration
• Education, skills and vocational training
• Impact of technological and demographic change
• Inequality
- EU-funded research projects – NEUJOBS, INGRID – kick started our work on Big Data and labour market
Outline 3
Methodological overview How to get the data?
Data analysis
Representativeness and biases
Our applications Previous 2011-2013 work covered use of EURES data (CZ, DK, IE) and
European portals (SK) to analyze non-cognitive skills – not covered here
New work – 2014 – 2016: General skills demand in the US
Demand for IT skills
Demand for language skills
How to collect the vacancy data 4
Access an existing database
High N
Clean, coded data
Good coverage of the market if high number of sources are covered
Crawl your own data
Necessary in some countries
Possibility to include more variables/metadata
Relatively low barrier to entry
Data analysis 5
Particularly with your own scrapped data, data management is a substantial task
Computer resources are an important bottleneck – a query can easily take hours (migrate away from Access if possible)
Many ways how to get meaning from unstructured text – word count, word density, machine learning, text mining
Do not underestimate the importance of metadata
Representativeness and limitations 6
No way to know what is the share of vacancies captured (many are never advertised in the first place). Furthermore, scrapping increasingly prevented (robots.txt)
Maybe better question – how representative is our sample for the vacancies that ARE posted online?
Another issue – languages and local context (proficiency in English might mean different thing in Sweden and in Spain)
Our work: vacancies and tags 7
We rely on vacancy data and data extracted from job portals (‘tags’)
Exploit advantages these data bring w.r.t traditional data sources
Aim: better understand labour demand
Vacancies: assemble and process job advertisements and then perform a text analysis
pros: highly detailed, real-time information
cons: data- and time-intensive
Tags: used to structure information on portal, keep track of tags and number of matching vacancies
pros: easy and fast, less data- and time-intensive
cons: less details, not possible on every portal
Demand for language skills in Visegrad 8
Working Paper: “The Importance of Foreign Language Skills in the Labour Markets of Central and Eastern Europe: An assessment based on data from online job portals”
Aim: to identify demand for foreign language skills in Czechia, Hungary, Poland, and Slovakia
knowledge of languages in Europe
language as part of human capital
ET2020 report: globalisation
multicultural / multilingual context
demand for foreign language and communication skills ↑
Methodology & Data: analyse tags on four job boards(linked to 74,000 vacancies), then focus on a subset of occupations available in all four countries
Main results:
Foreign languages are demanded in 1/3 to 3/4 of the job advertisements
English is most demanded (52%)
German comes in second place (12%)
Other languages hardly appear
Demand for language skills in Visegrad 9
Why the Visegrad region?
Common roots, close collaboration
Open to international trade and FDI
EU Members since 2004, SK in EMU
Recent migration flows
Demand side:
English as international business language
Strong economic and historical ties with Germany and Austria: German
Shared border with Soviet Union: Russian
Other languages: French, Spanish, Italian
Languages of neighbouring countries
Supply side:
Main national languages not commonly spoken in Europe, no bilingual countries
2012 Eurobarometer: English German
EU27 38% 11%
Czech Republic 27% 15%
Hungary 20% 18%
Poland 33% 19%
Slovakia 26% 22%
Demand for language skills in Visegrad 10
Results across all occupations (74,000 vacancies):
English > German
Much variation
Results for a subset of 59 occupations available in all four countries (66,000 vacancies):
35 high-skilled, 22 medium-skilled and 2 low-skilled occupations
English: Positive relationship between demand and complexity of occupation High-skilled: 67% Czechia, 65% Hungary, 62% Poland and 69% Slovakia (on average)
Medium- and low-skilled: 33% Czechia, 27% Hungary, 33% Poland and 32% Slovakia (on average)
Also positive correlation with median hourly wages
No such relationships for German
Czechia Hungary Poland Slovakia Total
English 28.19% 38.92% 63.99% 49.26% 51.89%
German 10.15% 10.86% 12.45% 14.59% 12.36%
French 0.65% 1.25% 3.56% 1.50% 2.33%
Italian 0.19% 0.67% 1.65% 0.55% 1.05%
Spanish 0.15% 0.52% 2.13% 0.48% 1.23%
Russian 0.54% 0.21% 1.6% 0.48% 0.96%
Demand for language skills in Visegrad 11
0
100
200
300
400
500
600
0
2
4
6
8
10
12
14
0% 20% 40% 60% 80% 100%
Ho
url
y w
age
(CZ
on
rig
ht
axis
)
Share of advertisements demanding English
sk
cz
0
50
100
150
200
250
300
350
400
0
2
4
6
8
10
12
14
0% 10% 20% 30% 40%
Ho
url
y w
age
(CZ
on
rig
ht
axis
)
Share of advertisements demanding German
sk
cz
Skills requirements in the US 12
Working Paper: “Skills Requirements for the 30 Most-Frequently Advertised Occupations in the United States: An analysis based on online vacancy data”
Aim: to map requirements of US employers for occupations of different complexities
formal education and specialised training
cognitive skills: specific, generic
non-cognitive skills: personal, social
experience
other
Methodology & Data: analyse about 2 million vacancies published on Burning Glass for the 30 most-frequently advertised occupations
→ keywords in vacancies
Main results:
Employers are demanding in their job advertisements
Formal education (67% of vacancies)
Service skills (49%, non-cognitive skill)
Experience (38%)
Skills requirements in the US 13
Sum of % of ads listing education and skills
Sum of % of ads listing all requirements
Employers are demanding in their job advertisements:
Positive relation with complexity
Also for low-skilled and medium-skilled occpations
But, there is a lot of variation
Top 5: security guards, tellers, event planners, managers and first-line office supervisors
ISCO Average sum
of skills
Average sum of
all requirements
1 4.3 (431%) 4.9 (488%)
2 3.9 (390%) 4.6 (464%)
3 3.8 (380%) 4.7 (466%)
4 3.8 (384%) 4.5 (446%)
5 3.8 (380%) 4.5 (447%)
7 3.4 (337%) 4.3 (426%)
8 3.3 (333%) 4.2 (420%)
9 2.4 (240%) 3.1 (310%)
Skills requirements in the US 14
Education: 67% of vacancies requires at least a high-school degree (45% - 83%)
For 28 occupations: >50% of vacancies - For 19 occupations: > 65% of vacancies
But, only 16% of job advertisements mentions specialised training and licenses
Experience: 38% of vacancies (21% - 51%)
For 28 occupations: > 25% of vacancies
Never ranked higher than formal education
Non-cognitive skills: both social and personal skills matter
Service skills 49%, flexibility 33%, team work 31%, timeliness 27%, communication skills 23%
Much variation
Cognitive skills: specific skills are more relevant than generic skills
25% of vacancies refer to computer skills, 16% to language skills, 12% to analytical skills
Much variation
IT skills in the US 15
Working Paper: “The IT Skill Pyramid: A Study on the Demand for Computer Skills on the US Labour Market ” (in progress)
Aim: to investigate importance of digital skills on US labour markets:
High unemployment yet many vacancies
Skill gaps and mismatches
Technological change
Methodology & Data: analyse about 2 million vacancies published on Burning Glass for the 30 most-frequently advertised occupations
→ keywords in vacancies
Main results:
IT skills are demanded in many vacancies, for occupations of various complexities
Hierarchical structure: basic > intermediate > advanced
IT skills in the US 16
Hierarchy of digital skills – Basic / Intermediate / Advanced
Recent study by Burning Glass identified productivity software skills, advanced digital skills and occupation-specific digital skills (focus on middle-skill jobs)
Our study considers low-, medium- and high-skilled occupations:
Basic and general digital skills
Intermediate digital skills (productivity software)
Advanced digital skills
Why the US?
Widespread use of computers and web (at home, work)
Supply of digital skills is falling behind: Broad overall digital competences
Advances skills in computer sciences and engineering
IT skills in the US 17
Basic and general digital skills: prevalence across all occupations • Computer: 35%
• Software: 9%
• Hardware: 3%
• Internet / Web: 19%
• E-mail / Outlook: 22%
IT skills in the US 18
Intermediate digital skills: prevalence across all occupations • Word / text processing / MS Word: 13%
• Spreadsheet / MS Excel: 14%
• MS PowerPoint: 3%
• Office Packages: 9%
• SAP: 1%
IT skills in the US 19
Advanced digital skills: Start from an extensive list of keywords:
CRM, databases and data management, data analysis and statistics, programming and programming languages, digital media and web design, desktop publishing, CMS, social media and blogging, SEO, …
Only very few vacancies refer to any of these skills (< 3% of job advertisements)
Databases and data management: 12%
Higher prevalence for medium- to high-skilled office jobs: secretaries, office clerks, accountants, …
Upcoming: New” occupations observatory
20
To identify new occupations on the basis of an innovative methodology, which relies on meta-data of job boards instead of vacancies
based on changes in the occupational classification of 11 job portals, with the aim to discover whether this approach could be adopted on a larger scale to identify new occupations and to further our understanding of the skills, education and other requirements that new occupations bring.
Belgium, the Czech Republic, Denmark, France, Germany, Hungary, Italy, Poland, Slovakia, Spain and the United Kingdom as 2016 pilot -represent about 75% of the EU population