Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

22
Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research

Transcript of Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Page 1: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Multi-language CASCOT

Margaret Birch and Ritva EllisonInstitute for Employment Research

Page 2: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Computer Assisted Structured Coding Tool

CASCOT

• Software tool for coding text automatically or manually

• Developed at the Institute for Employment Research at Warwick University 1993-

• Used by over 100 organisations in the UK and abroad

Page 3: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

IER contracted under the DASISH project to develop a multilingual version of CASCOT to code job titles to ISCO 08

A large task and limited resources, so this is a pilot project The 8 selected languages:

- Dutch (Netherlands, Flemish-Belgium)- English

- Finnish- French (France, Walloon-Belgium, Switzerland)- German (Germany, Austria, Switzerland)- Italian- Slovak- Spanish

Page 4: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Key Tasks Translating Cascot user interface texts Constructing national language versions of the ISCO 08

structure for Cascot Indexing job titles in the selected languages to ISCO 08

- Some supplied by NSIs or other partners- Some found by exploring relevant national websites

Validating the software using raw data files from the European Social Survey (ESS) Round 6

Testing Cascot multilingual software Developing language-based coding rules Using Cascot Performance Tool to fine-tune the software

Page 5: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Coding with Cascot

Enter text (could be from a file)

Cascot provides a recommendation for code but user can change it

Output can be directed to a file

Selected classification

Page 6: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Multi-language Cascot

• 8 languages available: Dutch, English, Finnish, French, German, Italian, Slovak and Spanish

Cascot detects language automatically but it can be changed from menu

ISCO-08 classification exists for each country (some with national code)

Page 7: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Coding in Dutch

Page 8: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Finnish

Page 9: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

French

Page 10: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

German*

* The index is © Federal Employment Agency

Page 11: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Italian

Page 12: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Slovak

Page 13: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Spanish

Page 14: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

A test of multi-language Cascot• Comparison of European Social Survey

round 6 code and automatic Cascot code• Data available from DE, ES, GB and NL

ISCO-08

Page 15: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Cascot Performance ToolAllows the user to analyse the performance of Cascot by comparing manually coded data with code produced by Cascot for the same data.

A delimited results file is needed that containsa reference code, Cascot code and Cascot score.

The Tool shows Performance Results Display window with Performance Graph, Summary, Statistics and Key

Page 16: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Opening a results file

Page 17: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Performance Results Display

The longer the green line stays high, the better

The more towards right the purple/blue lines are, the better

Page 18: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

• The versions in different languages could be improved by developing coding rules

• Contribution needed from experts who know the language

• Rules are developed with Cascot Editor

Fine-tuning multi-language Cascot

Page 19: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Cascot Editor• Classification files for Cascot are created and modified

with the Editor• Each classification has Structure, Index, Rules for coding

Page 20: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Cascot Editor Rules• Downgraded words: words that are considered to be significantly less

important than other words, e.g. deputy, junior, person• Equivalent word ends: wait|er, wait|ress• Abbreviations: asst assistant, fe further education• Replacement words: taylor tailor, tesco supermarket

– Omitting noise words, e.g. replace ‘part-time’ with nothing• Input modifications: used when the rule absolutely can not be made

elsewhere• Word alternatives: words and phrases that should also be tried as

possible solution candidates

• Conclusions, retired can not conclude, agent ambiguous (score 39)

• Default coding: a set of words and phrases that should be scored as though they were a different word or phrase

Page 21: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Example of a new rule - English

• Add two new Replacement Words rules:

• The result:

• The problem:

Page 22: Multi-language CASCOT Margaret Birch and Ritva Ellison Institute for Employment Research.

Potential for rules - GermanText to be coded Cascot

ScoreBest matching index entry (Cascot)

Klassenlehrer/in (Klasse 1-3)

2341 Lehrkräfte im Primarbereich

73 2330 Lehrkräfte im Sekundarbereich

Klassenlehrer/in

Diplomingenieur/in (Fahrzeugbau)

2144 Maschinenbauingenieure 52 7231 Kraftfahrzeugmechaniker und -schlosser

Fahrzeugbauer/in

Mopedbote/-in 8321 Kraftradfahrer 34 7522 Möbeltischler und verwandte Berufe

Büchsenschäfter/in/in

Rampenpersonal 9333 Frachtarbeiter und verwandte Berufe

27 4323 Bürokräfte in der Transportwirtschaft und verwandte Berufe

Rampenmanager/in

Maniküre 5142 Kosmetiker und verwandte Berufe

0 ---- No conclusion

ISCO-08 (ESCO) ISCO-08 (Cascot)

• German occupational titles were coded fully automatically with Cascot and the result was compared with an approved code. Above some examples where rules would improve Cascot coding performance.

• It is helpful to have “gold standard” files with a large number of real life job titles for which experts have assigned correct codes.

• Cascot coding result can be compared with “gold standard” to find areas for improvement.