Presented at GOR 24 on 22 February 2024 together with Verian Group (formerly Kantar Public). We discuss how AI technology was used to coding and classification of open-ended, free text questions on the example of ISCO-08 job descriptions. The talk covers the technical implementation and evaluation, as well as business process integration and a summary of business benefits.
This work is part of Verian's Public Data Innovation Hub (https://www.kantarpublic.com/de/Unsere-Expertise/daten-und-fakten/public-data-innovation-hub) and was presented in the session 'Large, Larger, LLMs' at the General Online Research Conference 2024 (https://www.conftool.org/gor24/index.php?page=browseSessions&form_session=103). Supporting write-ups are available at https://www.inspirient.com/case-studies/survey_analysis_automation.php and https://www.inspirient.com/case-studies/survey_quality_assurance.php
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
1. Verian Inspirient
Patrick Mertes ▪ Inspirient
Sophie Tschersich ▪ Verian Group
General Online Research Conference 2024 (GOR 24)
Cologne, 22 February 2024
Free Text Classification
with Neural Networks:
Training, Process Integration and
Results for ISCO-08 Job Titles
2. Verian Inspirient
Why is it necessary to classify job titles?
• Essential Data: Job titles are gathered in nearly all empirical social research studies
• Analytical Necessity: Classifying occupational groups can be crucial for analyzing survey data
• Standardization Systems: Two common systems are used for occupation coding in Germany:
• ISCO (international): Ensures international comparability, governed by the ILO
• KldB (national): Provides national comparability, administered by the Federal Statistical Office
How was this done without neural networks?
• Traditional Approach: Utilized lookup tables for exact matching with official directories
• Inherent Limitations: Variations in spelling, paraphrasing, and typos required extensive manual review
Relevance and Research Question
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
The Need to Automate Free Text Classification
22 Feb 2024 1
A time-consuming and labour-intensive task
that easily suffers from inconsistencies due to differences in expertise
3. Verian Inspirient
Hierarchical Structure Count Example 1 Example 2
Family medical practitioner Shoemaker
Major Groups 10 2 - Professionals 7 - Craft and related trades workers
Sub-Major Groups 43 22 - Health Professionals 75 - Food processing, wood working, garment and other craft
and related trades workers
Minor Groups 130 221 - Medical Doctors 753 - Garment and Related Trades Workers
Unit Groups 436 2211 - Generalist Medical Practitioners
2212 - Specialist Medical Practitioners
7531 - Tailors, Dressmakers, Furriers and Hatters
7532 - Garment and Related Patternmakers and Cutters
7533 - Sewing, Embroidery and Related Workers
7534 - Upholsterers and Related Workers
7535 - Pelt Dressers, Tanners and Fellmongers
7536 Shoemakers and Related Workers
ISCO-08: Framework for organizing information on labour and jobs globally,
with worldwide acceptance for categorizing job titles
Quick Example
2
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
ISCO-08 in a Nutshell
22 Feb 2024
This is what we need to classify
4. Verian Inspirient
Goal
Increased efficiency
Increased consistency through recommendations
Retain highest accuracy
Integration into existing workflow
Less than 50% recommended codes need manual decision
More stable project planning and calculation
Goals
3
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
Aiming for Efficiency and Consistency
22 Feb 2024
5. Verian Inspirient
Available Data
• Historical data from several past years projects
• Around 100.000 distinct German job titles
Difficulties for Machine Learning (examples)
• Unique: Teilzeitarbeitsverhältnis im Kino Cineplex Germering, Service-Mitarbeiter hinter der Theke
zum Essen ausgeben & Sääle säubern
• Very similar, but different class: Lehrerin, Fahrlehrer BE, Grundschullehrerin, Förderlehrerin,
Klavierlehrerin, etc.
• Multiple jobs with different classes: 1. Fitnesstrainierin (Aerobickurse) 2. Gymnasiallehrerin (Sport
und Deutsch), Schulleitung und Lehrerin Grundschule - Verwaltungsaufgaben, etc.
• Gender-specific spelling variations: Lehrender, Lehrerin, Lehrer; Kaufmann, Kauffrau; Koch, Köchin
Machine Learning Setup
4
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
From Job Titles to Training Data
22 Feb 2024
6. Verian Inspirient
Technical Dataflow Pipeline
5
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
AI Classification Steps
22 Feb 2024
Police officer
Head teacher of an
elementary school
was head of a department
until 1.6.2022; after
restructuring I now have to
wait and see what happens
next;
Part-time job at the Cineplex
Germering cinema, service
employee behind the
counter serving food &
cleaning the auditoriums.
5412 (99% confidence)
Police Officer
1349 (17% confidence)
Professional Services Managers Not Elsewhere
Classified
5131 (49% confidence)
Waiters
1345 (99% confidence)
Education Managers
Input Machine Learning Pipeline Output
Pre-
processing
Multi-Layer
Perceptron
Hashing
Vectorizer
7. Verian Inspirient
Focus on pre-trained classifiers
6
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
Evaluation of Classification Accuracy
22 Feb 2024
• Project_A.sav
• Project_B.sav
• Project_C.sav
Accuracy 83%
Coverage 86%
Accuracy 83%
Coverage 84%
Accuracy 83%
Coverage 88%
Classification System
ISCO-08
International Standard
Classification of
Occupations
Verian GER Evaluation Datasets
1. Classifiers configured to only report classifications with >40% confidence for this evaluation
86% of the data
with 83 % accuracy
Approach
1) Train classifiers with selection of Verian’s
historic manual classifications
2) Apply classifier to other datasets to test
real-world applicability1
But the labels should be 100%
accurate. How do we know
when to trust the AI?
8. Verian Inspirient
Trading lower coverage for increased accuracy
7
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
Deep-dive: Accuracy vs. Coverage
22 Feb 2024
50%
60%
70%
80%
90%
100%
50% 60% 70% 80% 90% 100%
ISCO-08
Accuracy
ISCO-08 Coverage
Project_A.sav
Increasing confidence cut-offs
decreases coverage, but
increases accuracy
9. Verian Inspirient 8
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles 22 Feb 2024
incl. per-interview
recommended action
and suggested
classifications
For Real Efficiency Gains, Classifier Results
Are Tightly Integrated into Existing Workflow
10. Verian Inspirient 9
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles 22 Feb 2024
11. Verian Inspirient
Business Impact
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
Six Key Benefits
22 Feb 2024 10
Goal
Increased efficiency
Increased consistency through recommendations
Retain highest accuracy
Integration into existing workflow
Less than 50% recommended codes need manual decision
More stable project planning and calculation
Manual decisions still
required, also as per client
mandate, but much faster
12. Verian Inspirient
Potential improvements to NN classifiers:
• Continuous Improvement: Accuracy enhancement through
ongoing learning from new data and corrections.
• Training for Coders: Up-skilling manual coders with datasets for
improved efficiency and accuracy.
• Broader Implementation: Extending the model to additional
national and international classifications, like ISCED and KldB2010.
First tests show that the same approach works very well.
Next steps to further advance this method
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
Future Work
22 Feb 2024 11
Accuracy 82%
Coverage 89%
Accuracy 89%
Coverage 90%
KldB 2010
Klassifikation der Berufe
WZ 2008
Klassifikation der
Wirtschaftszweige