SlideShare une entreprise Scribd logo
1  sur  13
Télécharger pour lire hors ligne
Verian Inspirient
Patrick Mertes ▪ Inspirient
Sophie Tschersich ▪ Verian Group
General Online Research Conference 2024 (GOR 24)
Cologne, 22 February 2024
Free Text Classification
with Neural Networks:
Training, Process Integration and
Results for ISCO-08 Job Titles
Verian Inspirient
Why is it necessary to classify job titles?
• Essential Data: Job titles are gathered in nearly all empirical social research studies
• Analytical Necessity: Classifying occupational groups can be crucial for analyzing survey data
• Standardization Systems: Two common systems are used for occupation coding in Germany:
• ISCO (international): Ensures international comparability, governed by the ILO
• KldB (national): Provides national comparability, administered by the Federal Statistical Office
How was this done without neural networks?
• Traditional Approach: Utilized lookup tables for exact matching with official directories
• Inherent Limitations: Variations in spelling, paraphrasing, and typos required extensive manual review
Relevance and Research Question
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
The Need to Automate Free Text Classification
22 Feb 2024 1
A time-consuming and labour-intensive task
that easily suffers from inconsistencies due to differences in expertise
Verian Inspirient
Hierarchical Structure Count Example 1 Example 2
Family medical practitioner Shoemaker
Major Groups 10 2 - Professionals 7 - Craft and related trades workers
Sub-Major Groups 43 22 - Health Professionals 75 - Food processing, wood working, garment and other craft
and related trades workers
Minor Groups 130 221 - Medical Doctors 753 - Garment and Related Trades Workers
Unit Groups 436 2211 - Generalist Medical Practitioners
2212 - Specialist Medical Practitioners
7531 - Tailors, Dressmakers, Furriers and Hatters
7532 - Garment and Related Patternmakers and Cutters
7533 - Sewing, Embroidery and Related Workers
7534 - Upholsterers and Related Workers
7535 - Pelt Dressers, Tanners and Fellmongers
7536 Shoemakers and Related Workers
ISCO-08: Framework for organizing information on labour and jobs globally,
with worldwide acceptance for categorizing job titles
Quick Example
2
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
ISCO-08 in a Nutshell
22 Feb 2024
This is what we need to classify
Verian Inspirient
Goal
Increased efficiency
Increased consistency through recommendations
Retain highest accuracy
Integration into existing workflow
Less than 50% recommended codes need manual decision
More stable project planning and calculation
Goals
3
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
Aiming for Efficiency and Consistency
22 Feb 2024
Verian Inspirient
Available Data
• Historical data from several past years projects
• Around 100.000 distinct German job titles
Difficulties for Machine Learning (examples)
• Unique: Teilzeitarbeitsverhältnis im Kino Cineplex Germering, Service-Mitarbeiter hinter der Theke
zum Essen ausgeben & Sääle säubern
• Very similar, but different class: Lehrerin, Fahrlehrer BE, Grundschullehrerin, Förderlehrerin,
Klavierlehrerin, etc.
• Multiple jobs with different classes: 1. Fitnesstrainierin (Aerobickurse) 2. Gymnasiallehrerin (Sport
und Deutsch), Schulleitung und Lehrerin Grundschule - Verwaltungsaufgaben, etc.
• Gender-specific spelling variations: Lehrender, Lehrerin, Lehrer; Kaufmann, Kauffrau; Koch, Köchin
Machine Learning Setup
4
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
From Job Titles to Training Data
22 Feb 2024
Verian Inspirient
Technical Dataflow Pipeline
5
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
AI Classification Steps
22 Feb 2024
Police officer
Head teacher of an
elementary school
was head of a department
until 1.6.2022; after
restructuring I now have to
wait and see what happens
next;
Part-time job at the Cineplex
Germering cinema, service
employee behind the
counter serving food &
cleaning the auditoriums.
5412 (99% confidence)
Police Officer
1349 (17% confidence)
Professional Services Managers Not Elsewhere
Classified
5131 (49% confidence)
Waiters
1345 (99% confidence)
Education Managers
Input Machine Learning Pipeline Output
Pre-
processing
Multi-Layer
Perceptron
Hashing
Vectorizer
Verian Inspirient
Focus on pre-trained classifiers
6
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
Evaluation of Classification Accuracy
22 Feb 2024
• Project_A.sav
• Project_B.sav
• Project_C.sav
Accuracy 83%
Coverage 86%
Accuracy 83%
Coverage 84%
Accuracy 83%
Coverage 88%
Classification System
ISCO-08
International Standard
Classification of
Occupations
Verian GER Evaluation Datasets
1. Classifiers configured to only report classifications with >40% confidence for this evaluation
86% of the data
with 83 % accuracy
Approach
1) Train classifiers with selection of Verian’s
historic manual classifications
2) Apply classifier to other datasets to test
real-world applicability1
But the labels should be 100%
accurate. How do we know
when to trust the AI?
Verian Inspirient
Trading lower coverage for increased accuracy
7
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
Deep-dive: Accuracy vs. Coverage
22 Feb 2024
50%
60%
70%
80%
90%
100%
50% 60% 70% 80% 90% 100%
ISCO-08
Accuracy
ISCO-08 Coverage
Project_A.sav
Increasing confidence cut-offs
decreases coverage, but
increases accuracy
Verian Inspirient 8
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles 22 Feb 2024
incl. per-interview
recommended action
and suggested
classifications
For Real Efficiency Gains, Classifier Results
Are Tightly Integrated into Existing Workflow
Verian Inspirient 9
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles 22 Feb 2024
Verian Inspirient
Business Impact
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
Six Key Benefits
22 Feb 2024 10
Goal
Increased efficiency
Increased consistency through recommendations
Retain highest accuracy
Integration into existing workflow
Less than 50% recommended codes need manual decision
More stable project planning and calculation
Manual decisions still
required, also as per client
mandate, but much faster
Verian Inspirient
Potential improvements to NN classifiers:
• Continuous Improvement: Accuracy enhancement through
ongoing learning from new data and corrections.
• Training for Coders: Up-skilling manual coders with datasets for
improved efficiency and accuracy.
• Broader Implementation: Extending the model to additional
national and international classifications, like ISCED and KldB2010.
First tests show that the same approach works very well.
Next steps to further advance this method
Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
Future Work
22 Feb 2024 11
Accuracy 82%
Coverage 89%
Accuracy 89%
Coverage 90%
KldB 2010
Klassifikation der Berufe
WZ 2008
Klassifikation der
Wirtschaftszweige
Verian Inspirient
Contact Information
Patrick Mertes ▪ patrick.mertes@inspirient.com
Sophie Tschersich ▪ sophie.tschersich@veriangroup.com

Contenu connexe

Similaire à Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles

DOES14 - Stephen Elliot - IDC - Delivering DevOps Business Metrics that Matter
DOES14 - Stephen Elliot - IDC - Delivering DevOps Business Metrics that MatterDOES14 - Stephen Elliot - IDC - Delivering DevOps Business Metrics that Matter
DOES14 - Stephen Elliot - IDC - Delivering DevOps Business Metrics that MatterGene Kim
 
Moving Up the PVC Maturity Curve in Industrial Manufacturing
Moving Up the PVC Maturity Curve in Industrial ManufacturingMoving Up the PVC Maturity Curve in Industrial Manufacturing
Moving Up the PVC Maturity Curve in Industrial ManufacturingZero Wait-State
 
Oracle business analytics and endeca approach Document
Oracle business analytics and endeca approach DocumentOracle business analytics and endeca approach Document
Oracle business analytics and endeca approach DocumentNitai Partners Inc
 
Why Observability is Key to Solving Business and Operational Challenges
Why Observability is Key to Solving Business and Operational ChallengesWhy Observability is Key to Solving Business and Operational Challenges
Why Observability is Key to Solving Business and Operational ChallengesEnterprise Management Associates
 
Con8154 controlling for multiple erp systems with oracle advanced controls
Con8154 controlling for multiple erp systems with oracle advanced controlsCon8154 controlling for multiple erp systems with oracle advanced controls
Con8154 controlling for multiple erp systems with oracle advanced controlsOracle
 
Customers talk about controlling access for multiple erp systems with oracle ...
Customers talk about controlling access for multiple erp systems with oracle ...Customers talk about controlling access for multiple erp systems with oracle ...
Customers talk about controlling access for multiple erp systems with oracle ...Oracle
 
Impact of Juniper Training and Certification on Network Management Activities
Impact of Juniper Training and Certification on Network Management ActivitiesImpact of Juniper Training and Certification on Network Management Activities
Impact of Juniper Training and Certification on Network Management ActivitiesJuniper Networks
 
CRJS466 – Psychopathology and CriminalityUnit 5 Individual Proje.docx
CRJS466 – Psychopathology and CriminalityUnit 5 Individual Proje.docxCRJS466 – Psychopathology and CriminalityUnit 5 Individual Proje.docx
CRJS466 – Psychopathology and CriminalityUnit 5 Individual Proje.docxfaithxdunce63732
 
Building a 360 Degree View of Your Customers on BICS
Building a 360 Degree View of Your Customers on BICSBuilding a 360 Degree View of Your Customers on BICS
Building a 360 Degree View of Your Customers on BICSPerficient, Inc.
 
Dev Dives: Fast-track time to value with UiPath Solution Accelerators
Dev Dives: Fast-track time to value with UiPath Solution AcceleratorsDev Dives: Fast-track time to value with UiPath Solution Accelerators
Dev Dives: Fast-track time to value with UiPath Solution AcceleratorsUiPathCommunity
 
How SQL Change Automation helps you deliver value faster
How SQL Change Automation helps you deliver value fasterHow SQL Change Automation helps you deliver value faster
How SQL Change Automation helps you deliver value fasterRed Gate Software
 
Veeva Systems Webinar: Driving Continuous Quality Improvements
Veeva Systems Webinar: Driving Continuous Quality ImprovementsVeeva Systems Webinar: Driving Continuous Quality Improvements
Veeva Systems Webinar: Driving Continuous Quality ImprovementsVeeva Systems
 
School performance management analytics in cloud
School performance management analytics in cloudSchool performance management analytics in cloud
School performance management analytics in cloudNitai Partners Inc
 
IMPLEMENTATION BEST PRACTICES Sep 22.pdf
IMPLEMENTATION BEST PRACTICES Sep 22.pdfIMPLEMENTATION BEST PRACTICES Sep 22.pdf
IMPLEMENTATION BEST PRACTICES Sep 22.pdfudayabhaskar42
 
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...Curiosity Software Ireland
 
Why Are Life Science Companies Moving to Office 365?
Why Are Life Science Companies Moving to Office 365?Why Are Life Science Companies Moving to Office 365?
Why Are Life Science Companies Moving to Office 365?Montrium
 
Chapter 1. Introduction to Software Engineering.pptx
Chapter 1. Introduction to Software Engineering.pptxChapter 1. Introduction to Software Engineering.pptx
Chapter 1. Introduction to Software Engineering.pptxgadisaAdamu
 

Similaire à Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles (20)

DOES14 - Stephen Elliot - IDC - Delivering DevOps Business Metrics that Matter
DOES14 - Stephen Elliot - IDC - Delivering DevOps Business Metrics that MatterDOES14 - Stephen Elliot - IDC - Delivering DevOps Business Metrics that Matter
DOES14 - Stephen Elliot - IDC - Delivering DevOps Business Metrics that Matter
 
Moving Up the PVC Maturity Curve in Industrial Manufacturing
Moving Up the PVC Maturity Curve in Industrial ManufacturingMoving Up the PVC Maturity Curve in Industrial Manufacturing
Moving Up the PVC Maturity Curve in Industrial Manufacturing
 
Oracle business analytics and endeca approach Document
Oracle business analytics and endeca approach DocumentOracle business analytics and endeca approach Document
Oracle business analytics and endeca approach Document
 
Why Observability is Key to Solving Business and Operational Challenges
Why Observability is Key to Solving Business and Operational ChallengesWhy Observability is Key to Solving Business and Operational Challenges
Why Observability is Key to Solving Business and Operational Challenges
 
Con8154 controlling for multiple erp systems with oracle advanced controls
Con8154 controlling for multiple erp systems with oracle advanced controlsCon8154 controlling for multiple erp systems with oracle advanced controls
Con8154 controlling for multiple erp systems with oracle advanced controls
 
Customers talk about controlling access for multiple erp systems with oracle ...
Customers talk about controlling access for multiple erp systems with oracle ...Customers talk about controlling access for multiple erp systems with oracle ...
Customers talk about controlling access for multiple erp systems with oracle ...
 
DevOps 2021 Research
DevOps 2021 ResearchDevOps 2021 Research
DevOps 2021 Research
 
Impact of Juniper Training and Certification on Network Management Activities
Impact of Juniper Training and Certification on Network Management ActivitiesImpact of Juniper Training and Certification on Network Management Activities
Impact of Juniper Training and Certification on Network Management Activities
 
CRJS466 – Psychopathology and CriminalityUnit 5 Individual Proje.docx
CRJS466 – Psychopathology and CriminalityUnit 5 Individual Proje.docxCRJS466 – Psychopathology and CriminalityUnit 5 Individual Proje.docx
CRJS466 – Psychopathology and CriminalityUnit 5 Individual Proje.docx
 
Building a 360 Degree View of Your Customers on BICS
Building a 360 Degree View of Your Customers on BICSBuilding a 360 Degree View of Your Customers on BICS
Building a 360 Degree View of Your Customers on BICS
 
Dev Dives: Fast-track time to value with UiPath Solution Accelerators
Dev Dives: Fast-track time to value with UiPath Solution AcceleratorsDev Dives: Fast-track time to value with UiPath Solution Accelerators
Dev Dives: Fast-track time to value with UiPath Solution Accelerators
 
How SQL Change Automation helps you deliver value faster
How SQL Change Automation helps you deliver value fasterHow SQL Change Automation helps you deliver value faster
How SQL Change Automation helps you deliver value faster
 
Veeva Systems Webinar: Driving Continuous Quality Improvements
Veeva Systems Webinar: Driving Continuous Quality ImprovementsVeeva Systems Webinar: Driving Continuous Quality Improvements
Veeva Systems Webinar: Driving Continuous Quality Improvements
 
School performance management analytics in cloud
School performance management analytics in cloudSchool performance management analytics in cloud
School performance management analytics in cloud
 
IMPLEMENTATION BEST PRACTICES Sep 22.pdf
IMPLEMENTATION BEST PRACTICES Sep 22.pdfIMPLEMENTATION BEST PRACTICES Sep 22.pdf
IMPLEMENTATION BEST PRACTICES Sep 22.pdf
 
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...
Curiosity and Xray present - In sprint testing: Aligning tests and teams to r...
 
Tomi resume
Tomi resumeTomi resume
Tomi resume
 
Why Are Life Science Companies Moving to Office 365?
Why Are Life Science Companies Moving to Office 365?Why Are Life Science Companies Moving to Office 365?
Why Are Life Science Companies Moving to Office 365?
 
Chapter 1. Introduction to Software Engineering.pptx
Chapter 1. Introduction to Software Engineering.pptxChapter 1. Introduction to Software Engineering.pptx
Chapter 1. Introduction to Software Engineering.pptx
 
Azure Engineering MLOps
Azure Engineering MLOps Azure Engineering MLOps
Azure Engineering MLOps
 

Plus de Inspirient

Inspirient's Pitch for the MRS Research Live Innovation of the Year Award 2023
Inspirient's Pitch for the MRS Research Live Innovation of the Year Award 2023Inspirient's Pitch for the MRS Research Live Innovation of the Year Award 2023
Inspirient's Pitch for the MRS Research Live Innovation of the Year Award 2023Inspirient
 
Trustworthy Analytics with Generative AI: Four Use Cases for ChatGPT / GPT-4
Trustworthy Analytics with Generative AI: Four Use Cases for ChatGPT / GPT-4Trustworthy Analytics with Generative AI: Four Use Cases for ChatGPT / GPT-4
Trustworthy Analytics with Generative AI: Four Use Cases for ChatGPT / GPT-4Inspirient
 
Standardized Annotations for Survey Datasets: Enabling Automated Quality Assu...
Standardized Annotations for Survey Datasets: Enabling Automated Quality Assu...Standardized Annotations for Survey Datasets: Enabling Automated Quality Assu...
Standardized Annotations for Survey Datasets: Enabling Automated Quality Assu...Inspirient
 
Diminishing Returns: When Should Real- world Surveys Stop Sampling?
Diminishing Returns: When Should Real- world Surveys Stop Sampling?Diminishing Returns: When Should Real- world Surveys Stop Sampling?
Diminishing Returns: When Should Real- world Surveys Stop Sampling?Inspirient
 
Complementing Political / Social Research with AI for a Deeper Understanding ...
Complementing Political / Social Research with AI for a Deeper Understanding ...Complementing Political / Social Research with AI for a Deeper Understanding ...
Complementing Political / Social Research with AI for a Deeper Understanding ...Inspirient
 
Lessons from Analyzing All of Europe's Data
Lessons from Analyzing All of Europe's DataLessons from Analyzing All of Europe's Data
Lessons from Analyzing All of Europe's DataInspirient
 
Insights beyond Human Intuition: Comprehensively Mining Survey Data
Insights beyond Human Intuition: Comprehensively Mining Survey DataInsights beyond Human Intuition: Comprehensively Mining Survey Data
Insights beyond Human Intuition: Comprehensively Mining Survey DataInspirient
 
Digital Crime Scene Investigation
Digital Crime Scene InvestigationDigital Crime Scene Investigation
Digital Crime Scene InvestigationInspirient
 
Scalability and Autonomous Analytics
Scalability and Autonomous AnalyticsScalability and Autonomous Analytics
Scalability and Autonomous AnalyticsInspirient
 
Future Roles in AI-enabled Organisations
Future Roles in AI-enabled OrganisationsFuture Roles in AI-enabled Organisations
Future Roles in AI-enabled OrganisationsInspirient
 
KI und Predictive Maintenance am Beispiel von DB Cargo
KI und Predictive Maintenance am Beispiel von DB CargoKI und Predictive Maintenance am Beispiel von DB Cargo
KI und Predictive Maintenance am Beispiel von DB CargoInspirient
 

Plus de Inspirient (11)

Inspirient's Pitch for the MRS Research Live Innovation of the Year Award 2023
Inspirient's Pitch for the MRS Research Live Innovation of the Year Award 2023Inspirient's Pitch for the MRS Research Live Innovation of the Year Award 2023
Inspirient's Pitch for the MRS Research Live Innovation of the Year Award 2023
 
Trustworthy Analytics with Generative AI: Four Use Cases for ChatGPT / GPT-4
Trustworthy Analytics with Generative AI: Four Use Cases for ChatGPT / GPT-4Trustworthy Analytics with Generative AI: Four Use Cases for ChatGPT / GPT-4
Trustworthy Analytics with Generative AI: Four Use Cases for ChatGPT / GPT-4
 
Standardized Annotations for Survey Datasets: Enabling Automated Quality Assu...
Standardized Annotations for Survey Datasets: Enabling Automated Quality Assu...Standardized Annotations for Survey Datasets: Enabling Automated Quality Assu...
Standardized Annotations for Survey Datasets: Enabling Automated Quality Assu...
 
Diminishing Returns: When Should Real- world Surveys Stop Sampling?
Diminishing Returns: When Should Real- world Surveys Stop Sampling?Diminishing Returns: When Should Real- world Surveys Stop Sampling?
Diminishing Returns: When Should Real- world Surveys Stop Sampling?
 
Complementing Political / Social Research with AI for a Deeper Understanding ...
Complementing Political / Social Research with AI for a Deeper Understanding ...Complementing Political / Social Research with AI for a Deeper Understanding ...
Complementing Political / Social Research with AI for a Deeper Understanding ...
 
Lessons from Analyzing All of Europe's Data
Lessons from Analyzing All of Europe's DataLessons from Analyzing All of Europe's Data
Lessons from Analyzing All of Europe's Data
 
Insights beyond Human Intuition: Comprehensively Mining Survey Data
Insights beyond Human Intuition: Comprehensively Mining Survey DataInsights beyond Human Intuition: Comprehensively Mining Survey Data
Insights beyond Human Intuition: Comprehensively Mining Survey Data
 
Digital Crime Scene Investigation
Digital Crime Scene InvestigationDigital Crime Scene Investigation
Digital Crime Scene Investigation
 
Scalability and Autonomous Analytics
Scalability and Autonomous AnalyticsScalability and Autonomous Analytics
Scalability and Autonomous Analytics
 
Future Roles in AI-enabled Organisations
Future Roles in AI-enabled OrganisationsFuture Roles in AI-enabled Organisations
Future Roles in AI-enabled Organisations
 
KI und Predictive Maintenance am Beispiel von DB Cargo
KI und Predictive Maintenance am Beispiel von DB CargoKI und Predictive Maintenance am Beispiel von DB Cargo
KI und Predictive Maintenance am Beispiel von DB Cargo
 

Dernier

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 

Dernier (20)

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 

Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles

  • 1. Verian Inspirient Patrick Mertes ▪ Inspirient Sophie Tschersich ▪ Verian Group General Online Research Conference 2024 (GOR 24) Cologne, 22 February 2024 Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles
  • 2. Verian Inspirient Why is it necessary to classify job titles? • Essential Data: Job titles are gathered in nearly all empirical social research studies • Analytical Necessity: Classifying occupational groups can be crucial for analyzing survey data • Standardization Systems: Two common systems are used for occupation coding in Germany: • ISCO (international): Ensures international comparability, governed by the ILO • KldB (national): Provides national comparability, administered by the Federal Statistical Office How was this done without neural networks? • Traditional Approach: Utilized lookup tables for exact matching with official directories • Inherent Limitations: Variations in spelling, paraphrasing, and typos required extensive manual review Relevance and Research Question Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles The Need to Automate Free Text Classification 22 Feb 2024 1 A time-consuming and labour-intensive task that easily suffers from inconsistencies due to differences in expertise
  • 3. Verian Inspirient Hierarchical Structure Count Example 1 Example 2 Family medical practitioner Shoemaker Major Groups 10 2 - Professionals 7 - Craft and related trades workers Sub-Major Groups 43 22 - Health Professionals 75 - Food processing, wood working, garment and other craft and related trades workers Minor Groups 130 221 - Medical Doctors 753 - Garment and Related Trades Workers Unit Groups 436 2211 - Generalist Medical Practitioners 2212 - Specialist Medical Practitioners 7531 - Tailors, Dressmakers, Furriers and Hatters 7532 - Garment and Related Patternmakers and Cutters 7533 - Sewing, Embroidery and Related Workers 7534 - Upholsterers and Related Workers 7535 - Pelt Dressers, Tanners and Fellmongers 7536 Shoemakers and Related Workers ISCO-08: Framework for organizing information on labour and jobs globally, with worldwide acceptance for categorizing job titles Quick Example 2 Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles ISCO-08 in a Nutshell 22 Feb 2024 This is what we need to classify
  • 4. Verian Inspirient Goal Increased efficiency Increased consistency through recommendations Retain highest accuracy Integration into existing workflow Less than 50% recommended codes need manual decision More stable project planning and calculation Goals 3 Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles Aiming for Efficiency and Consistency 22 Feb 2024
  • 5. Verian Inspirient Available Data • Historical data from several past years projects • Around 100.000 distinct German job titles Difficulties for Machine Learning (examples) • Unique: Teilzeitarbeitsverhältnis im Kino Cineplex Germering, Service-Mitarbeiter hinter der Theke zum Essen ausgeben & Sääle säubern • Very similar, but different class: Lehrerin, Fahrlehrer BE, Grundschullehrerin, Förderlehrerin, Klavierlehrerin, etc. • Multiple jobs with different classes: 1. Fitnesstrainierin (Aerobickurse) 2. Gymnasiallehrerin (Sport und Deutsch), Schulleitung und Lehrerin Grundschule - Verwaltungsaufgaben, etc. • Gender-specific spelling variations: Lehrender, Lehrerin, Lehrer; Kaufmann, Kauffrau; Koch, Köchin Machine Learning Setup 4 Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles From Job Titles to Training Data 22 Feb 2024
  • 6. Verian Inspirient Technical Dataflow Pipeline 5 Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles AI Classification Steps 22 Feb 2024 Police officer Head teacher of an elementary school was head of a department until 1.6.2022; after restructuring I now have to wait and see what happens next; Part-time job at the Cineplex Germering cinema, service employee behind the counter serving food & cleaning the auditoriums. 5412 (99% confidence) Police Officer 1349 (17% confidence) Professional Services Managers Not Elsewhere Classified 5131 (49% confidence) Waiters 1345 (99% confidence) Education Managers Input Machine Learning Pipeline Output Pre- processing Multi-Layer Perceptron Hashing Vectorizer
  • 7. Verian Inspirient Focus on pre-trained classifiers 6 Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles Evaluation of Classification Accuracy 22 Feb 2024 • Project_A.sav • Project_B.sav • Project_C.sav Accuracy 83% Coverage 86% Accuracy 83% Coverage 84% Accuracy 83% Coverage 88% Classification System ISCO-08 International Standard Classification of Occupations Verian GER Evaluation Datasets 1. Classifiers configured to only report classifications with >40% confidence for this evaluation 86% of the data with 83 % accuracy Approach 1) Train classifiers with selection of Verian’s historic manual classifications 2) Apply classifier to other datasets to test real-world applicability1 But the labels should be 100% accurate. How do we know when to trust the AI?
  • 8. Verian Inspirient Trading lower coverage for increased accuracy 7 Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles Deep-dive: Accuracy vs. Coverage 22 Feb 2024 50% 60% 70% 80% 90% 100% 50% 60% 70% 80% 90% 100% ISCO-08 Accuracy ISCO-08 Coverage Project_A.sav Increasing confidence cut-offs decreases coverage, but increases accuracy
  • 9. Verian Inspirient 8 Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles 22 Feb 2024 incl. per-interview recommended action and suggested classifications For Real Efficiency Gains, Classifier Results Are Tightly Integrated into Existing Workflow
  • 10. Verian Inspirient 9 Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles 22 Feb 2024
  • 11. Verian Inspirient Business Impact Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles Six Key Benefits 22 Feb 2024 10 Goal Increased efficiency Increased consistency through recommendations Retain highest accuracy Integration into existing workflow Less than 50% recommended codes need manual decision More stable project planning and calculation Manual decisions still required, also as per client mandate, but much faster
  • 12. Verian Inspirient Potential improvements to NN classifiers: • Continuous Improvement: Accuracy enhancement through ongoing learning from new data and corrections. • Training for Coders: Up-skilling manual coders with datasets for improved efficiency and accuracy. • Broader Implementation: Extending the model to additional national and international classifications, like ISCED and KldB2010. First tests show that the same approach works very well. Next steps to further advance this method Free Text Classification with Neural Networks: Training, Process Integration and Results for ISCO-08 Job Titles Future Work 22 Feb 2024 11 Accuracy 82% Coverage 89% Accuracy 89% Coverage 90% KldB 2010 Klassifikation der Berufe WZ 2008 Klassifikation der Wirtschaftszweige
  • 13. Verian Inspirient Contact Information Patrick Mertes ▪ patrick.mertes@inspirient.com Sophie Tschersich ▪ sophie.tschersich@veriangroup.com