SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
© AvitoBasic elements guidelines.
FILES: Avito-LOGO_RGB.eps, Avito-LOGO_CMYK_Pa
RecSys Challenge 2016: Job Recommendation
Based on Factorization Machines and Topic
Modelling
7th place solution
Vasily Leksin, Andrey Ostapets
Avito.ru
15-09-2016
Basic elements guidelines.
Problem statement
Data description
∙ Impressions — details about which items (job postings) were
shown to which user by the existing recommender (19 August
2015 — 9 November 2015).
∙ Interactions — interactions that the user performed on the
items (clicked, bookmarked, replied or deleted).
∙ Users — users details: job roles, career level, discipline,
industry, location, experience, and education.
∙ Items — items details: title, career level, discipline, industry,
location, employment type, tags, created time and flag if item
was active during the test.
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 2 / 23
Basic elements guidelines.
Problem statement
Data description: impressions and interactions
Date interval: 2015-08-19 – 2015-11-09
Impressions
∙ 201M unique user-item-week tuples
∙ 2.7M unique users
∙ 846K unique items
Interactions
∙ 8.8M events: clicked – 7.2M, deleted – 1.0M, replied – 422K,
bookmarked – 206K
∙ 785K unique users
∙ 1.03M unique items
∙ 2.8M из 6.9M (user-item) pairs are in impressions
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 3 / 23
Basic elements guidelines.
Problem statement
Data description: target users and items
150K users for making recommendations, from which:
∙ 39.7К (26.5%) have no events
∙ 59.5K (39.6%) have less than 2 events
∙ 70.6K (47.1%) have less than 3 events
327К active items, from which:
∙ 129К (39.5%) have no events
∙ 164K (50.1%) have less than 2 events
∙ 188K (57.6%) have less then 3 events
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 4 / 23
Basic elements guidelines.
Problem statement
Task of the challenge
score(R, ˆR) =
∑︁
u∈U
20(P2(ru, ˆru) + P4(ru, ˆru) + R30(ru, ˆru)+
+S30(ru, ˆru)) + 10(P6(ru, ˆru) + P20(ru, ˆru)),
where
U = {0, . . . , N − 1} – list of target users,
R = {ru}u∈U – lists of relevant items,
ˆR = {ˆru}u∈U – the solution,
Pk(ru, ˆru) – precision at top k for user u,
R30(ru, ˆru) – recall at top 30,
S30(ru, ˆru) – user success.
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 5 / 23
Basic elements guidelines.
Problem statement
Models validation
∙ The last week of interactions
∙ 10 000 random users from those who made any interactions
during this week
∙ Old items (created more than a month ago) without
interactions were removed
∙ Obtained score was highly correlated with the result on the
Public Leaderboard
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 6 / 23
Basic elements guidelines.
Solution of the team
Interesting insights from data
∙ A significant proportion of users and items have a small
number of events or have no events. It means that we need to
use a hybrid approach that takes into account not only
collaborative filtering but the content data of items and users.
∙ Impressions slowly change over time. That is, the presence of a
pair of user-item in impressions is a useful feature, and we use
it as the separate model.
∙ Geographical features (distance, region, city, geoclusters etc.)
are not improve score significantly.
∙ Tokens from user profiles and items are good features.
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 7 / 23
Basic elements guidelines.
Solution of the team
Interesting insights from data
User profile and sessions exampleCV
experience_n_experiencediscipline_id_user country_useregion_user jobroles
5 or more entr 10‐15 yearsSales & Commerce Germany Bavaria ['962959', '283291', '502342']
SESSIONS
created_at impressiondiscipline_id_item country_itemregion_item title tags
09‐01 1:27 1 Production & ManufactGermany Baden‐Württem['620383', '1118975'] ['102823', '1335184', '624061', '73234', '1604815', '2862074' 
09‐01 1:27 1 Other Disciplines Germany Hamburg ['4572761', '3543754', '196892['993979', '2426818', '792504', '4425481', '494116', '976257' 
09‐01 1:27 1 Other Disciplines Germany Berlin ['18091'] ['4198994', '4182900', '4354582', '1399193', '1377742', '3580 
09‐02 0:21 1 Health, Medical & Socianon_dach not specified ['165415', '1986087', '2585795['2426818', '3726822', '792504', '1830721', '184797', '325622 
09‐04 20:46 0 IT & Software DevelopmGermany Brandenburg ['655030'] ['1491612', '972718', '2426818', '2383555', '4483314', '43216 
09‐08 20:16 1 Other Disciplines Germany Hamburg ['2915824', '4035399', '156769['2110329', '503870', '2426818', '1437930', '2245760', '35922 
09‐08 22:40 1 Sales & Commerce Germany not specified ['3418410', '3413328'] ['686709', '2036672', '3794933', '502342', '3413328', '117856 
09‐08 22:41 1 IT & Software Developmnon_dach not specified ['3408137'] ['2632767', '1491612', '2245760', '689679', '1565617', '43216 
09‐09 22:58 1 Administration Germany Berlin ['4141254', '1118975'] ['4162864', '1491612', '1565617', '689679', '4204056', '15454 
09‐09 23:00 0 Production & ManufactGermany Lower Saxony ['4454260', '502342'] ['543177', '4160943', '2501578', '4329775', '3085937', '23421 
09‐09 23:01 0 Other Disciplines Germany Lower Saxony ['1567693', '568776'] ['1178568', '1248479', '370640', '2342166', '94890', '3794933 
09‐09 23:02 1 Health, Medical & Socianon_dach not specified ['1567693'] ['1178568', '1565617', '1491612', '1601282', '2380081', '4941 
09‐09 23:08 0 Finance, Accounting & CAustria not specified ['2865345', '3294368'] ['3391339', '3176219', '4499767', '2426818', '798840', '37295 
09‐09 23:08 0 Production & ManufactGermany Lower Saxony ['494116'] ['3176219', '159096', '3391339', '4499767', '494116', '372957 
09‐09 23:08 0 Health, Medical & SociaSwitzerland not specified ['128836', '1836819'] ['1798728', '128836', '675557', '2976021']  
09‐09 23:09 0 Production & ManufactAustria not specified ['2846960', '76751', '4227194',['2846960', '2632767', '3872048', '3939477', '1469275', '1695 
09‐09 23:09 0 Engineering & TechnicaGermany Berlin ['4141254'] ['749243', '362736', '692505', '3669898', '624061', '494116']  
09‐10 0:59 1 Production & ManufactAustria not specified ['1118975', '3478136'] ['502342', '4151211', '4439048', '3210328', '624061', '313759 
09‐10 0:59 1 Engineering & TechnicaGermany Bavaria ['128836', '76887'] ['816406', '3347566', '502342', '4425481', '4160943', '160481 
09‐10 0:59 1 Engineering & TechnicaGermany North Rhine‐W ['1119117', '3705605', '347813['2896178', '1357922', '2031982', '1491612', '1830721', '1335 
09‐10 1:00 0 Other Disciplines Austria not specified ['2915824', '4035399', '156769['1565617', '1496767', '82994', '1625244', '1941434', '123188 
09‐10 1:00 1 Teaching, R&D Germany North Rhine‐W ['1986087'] ['3144475', '4245173', '3096790', '655817', '2969837', '43216 
09‐10 1:00 0 Other Disciplines Germany not specified ['2573697', '4035399', '448435['3457262', '3658040', '2126708', '2110329', '2630003', '4017 
09‐10 1:00 0 Engineering & TechnicaGermany Baden‐Württem['2140778', '3241763'] ['1734724', '2000691', '4425481', '2111897', '577140', '94890 
09‐10 1:00 1 Management & Corpor Germany Bavaria ['494116', '1119117', '2387379['4245173', '1231885', '272304', '4140111', '4321623', '18307 
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 8 / 23
Basic elements guidelines.
Solution of the team
Item-based collaborative filtering
Similarity metrics:
∙ Jaccard
∙ Cosine
∙ Pearson
Event types for training:
∙ All Positive interactions
∙ Only Click interactions
∙ Impressions
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 9 / 23
Basic elements guidelines.
Solution of the team
Factorization Machines
Predicted score for user i on item j is given by:
p(i,j) = 𝜇 + wi + wj + aT
xi + bT
yj + uT
i vj ,
where
𝜇 – a global bias term,
wi and wj are weight terms for user i and item j respectively,
xi and yj are the user and item side feature vectors,
a and b are the weight vectors for those side features,
ui and vj – latent factors, which are vectors of fixed length
(number of factors is a parameter).
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 10 / 23
Basic elements guidelines.
Solution of the team
Factorization Machines: main parameters
∙ Number of latent factors (30 – 400)
∙ Number of sampled negative examples (1 – 12)
∙ Maximum number of iterations (25 – 70)
∙ Regularization parameters (1e-9 – 1e-7)
∙ User and item side features
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 11 / 23
Basic elements guidelines.
Solution of the team
Factorization Machines: side features
Users - all features (OneHotEncoder)
∙ jobroles
∙ career_level, discipline_id, industry_id
∙ country, region
∙ experience: n_entries_class, years, years_in_current
∙ edu: degree, field_of_studies
Items - all features, except latitude and longitude
(OneHotEncoder)
∙ title, tags
∙ career_level, discipline_id, industry_id
∙ country, region
∙ employment_type
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 12 / 23
Basic elements guidelines.
Solution of the team
Topic model: Latent Semantic Indexing (LSI)
∙ Let document associated with each user be all title and tags
tokens of items, which the user interacted with and job roles
tokens from user description.
∙ Convert each document into a token occurrences vector.
∙ Transform values in each vector to TF-IDF statistics and
combine all vectors into a large token-document matrix.
∙ Then we apply Singular Value Decomposition (SVD) technique
on the token-document matrix
∙ The similarity between user and item will be the similarity
between corresponding latent vectors.
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 13 / 23
Basic elements guidelines.
Solution of the team
Solution framework
Initial dataset
Item­based
models
FM models Topic model
Blending
Output
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 14 / 23
Basic elements guidelines.
Solution of the team
Linear Ensemble
Base models:
FR0 SIM0 PI Local Score
1 0 0 76995
0 1 0 69622
0 0 1 104495
1 1 1 132505
∙ SIM0 – Item-based Recommender (jaccard similarity)
∙ FR0 – Factorization Machines Recommender (400 factors)
∙ PI – Past Impressions Recommender (very simple model with
binary output)
∙ Local Score: 10 000 random users who made interactions with
items during last week
∙ The score on the Public Leaderboard ≈ 3.8 × the score on our
Local Validation
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 15 / 23
Basic elements guidelines.
Solution of the team
Linear Ensemble
Version «zero»:
FR0 SIM0 PI Local Score
1 2 1 134285
The first version:
FR0 SIM0 FR8.0
0 * SIM0 PI Local Score
1 13 8 1 138073
The second version:
FR1 SIM0 FR8.0
1 * SIM0 PI Local Score
1 13 8 1 140876
FR1 = FR_f 100_i25
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 16 / 23
Basic elements guidelines.
Solution of the team
Linear Ensemble
The third version:
FR2 SIM2 FR8.0
2 * SIM2 PI Local Score
1 13 8 1 143653
FR2 = 0.5*FR_f 100_i25 + 0.5*FR_f 400_i70
SIM2 = 0.5*SIM_jac + 0.5*SIM_click
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 17 / 23
Basic elements guidelines.
Solution of the team
Linear Ensemble
Local Score (145841):
1.0*FR3 + 15.0 * (FR8.0
3 * SIM3) + 13.0 * SIM3 + 1.0 * PI - 0.5 *
SIM_pearson - 0.3 * FR_f 400_i50_no_side + 0.5 *
(FR_imp2.0
* SIM_imp), where
FR3 = 0.5*FR_f 100_i25 + 0.5*FR_f 400_i70
SIM3 = SIM_click
Local Score (146569):
1.0*FR4 + 15.0 * (FR8.0
4 * SIM4) + 13.0 * SIM4 + 1.0 * PI - 0.4 *
SIM_pearson - 0.3 * FR_f 400_i50_no_side + 0.5 *
(FR_imp2.0
* SIM_imp) + 0.2 * TM, where
FR4 = 0.5*FR_f 100_i25 + 0.5*FR_f 400_i70
SIM4 = 0.4*SIM_jac + 0.6*SIM_t = 1
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 18 / 23
Basic elements guidelines.
Solution of the team
Final models set
SIM_jac Item-based jaccard similarity
SIM_click Item-based jaccard similarity on clicks
SIM_pearson Item-based pearson similarity
SIM_imp Item-based jaccard similarity on impressions
FR_f100_i25 Factorization, n_factors=100, iter=25
FR_f400_i70 Factorization, n_factors=400, iter=70
FR_f400_i50_no_side Factorization, no side data
FR_imp Factorization on impressions
TM LSI topic model
PI Past Impressions
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 19 / 23
Basic elements guidelines.
Solution of the team
Hardware&Software
∙ 1 server: 28 cores, 56 threads, 256Gb RAM
∙ Full training + prediction = 16 hours
∙ All code was written in Python
∙ ML Libraries: graphlab, gensim
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 20 / 23
Basic elements guidelines.
Leaderboard
Submission history
Score Rank Name Date
554655 9 Topic model 100 factors 06/27/16
548366 8 Top 150 candidates from every model 06/25/16
543284 8 8 model set: 4FR + 4SIM + TM 06/24/16
537157 9 Topic model 06/23/16
530599 10 3 models set: FR + 2SIM 06/23/16
497136 15 3 models set: FR + 2SIM 06/22/16
496241 1 FR with side data 03/20/16
397604 1 Past Impressions model 03/11/16
132790 1 Simple item-based recommender 03/10/16
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 21 / 23
Basic elements guidelines.
Leaderboard
Results
Rank Team Leaderboard Score Full Score
1 YunOS-OneSearch 681707.38 2052185.54
2 mim-solutions 675985.03 2035964.16
3 DaveXster 665592.06 2005263.73
4 PumpkinPie 622408.55 1866477.77
5 milk tea 613125.21 1846420.12
6 mdr_rec 605048.58 1823472.31
7 Avito 554654.72 1677898.52
8 recometric 556133.18 1677233.84
9 nodalpoints 555483.39 1671812.08
10 lucky_dog 542213.51 1632828.82
21 XING_TELECOM 461000.32 1397030.74
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 22 / 23
Thank you!
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 23 / 23

Contenu connexe

Tendances

1030 track 2 barrett_using our laptop
1030 track 2 barrett_using our laptop1030 track 2 barrett_using our laptop
1030 track 2 barrett_using our laptopRising Media, Inc.
 
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DMFrom Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DMMichał Łopuszyński
 
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...Matthias Braunhofer
 
Usability Assessment of a Context-Aware and Personality-Based Mobile Recommen...
Usability Assessment of a Context-Aware and Personality-Based Mobile Recommen...Usability Assessment of a Context-Aware and Personality-Based Mobile Recommen...
Usability Assessment of a Context-Aware and Personality-Based Mobile Recommen...Matthias Braunhofer
 
Measuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systemsMeasuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systemsAmit Sharma
 
Causal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationCausal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationScientificRevenue
 
Contextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsContextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsMatthias Braunhofer
 
Beyond Churn Prediction : An Introduction to uplift modeling
Beyond Churn Prediction : An Introduction to uplift modelingBeyond Churn Prediction : An Introduction to uplift modeling
Beyond Churn Prediction : An Introduction to uplift modelingPierre Gutierrez
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...
[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...
[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...Jihoo Kim
 
AIRG Presentation
AIRG PresentationAIRG Presentation
AIRG Presentationnirvdrum
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandrySri Ambati
 
Product Recommendations Enhanced with Reviews
Product Recommendations Enhanced with ReviewsProduct Recommendations Enhanced with Reviews
Product Recommendations Enhanced with Reviewsmaranlar
 
Predictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive IndustryPredictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive IndustryMatouš Havlena
 
Group Project - Final - Linked in
Group Project - Final - Linked inGroup Project - Final - Linked in
Group Project - Final - Linked inSanket Butoliya
 
Introduction to MaxDiff Scaling of Importance - Parametric Marketing Slides
Introduction to MaxDiff Scaling of Importance - Parametric Marketing SlidesIntroduction to MaxDiff Scaling of Importance - Parametric Marketing Slides
Introduction to MaxDiff Scaling of Importance - Parametric Marketing SlidesQuestionPro
 
Make the Most of Your Time: How Should the Analyst Work with Automated Tracea...
Make the Most of Your Time: How Should the Analyst Work with Automated Tracea...Make the Most of Your Time: How Should the Analyst Work with Automated Tracea...
Make the Most of Your Time: How Should the Analyst Work with Automated Tracea...Tim Menzies
 

Tendances (20)

1030 track 2 barrett_using our laptop
1030 track 2 barrett_using our laptop1030 track 2 barrett_using our laptop
1030 track 2 barrett_using our laptop
 
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DMFrom Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
 
Usability Assessment of a Context-Aware and Personality-Based Mobile Recommen...
Usability Assessment of a Context-Aware and Personality-Based Mobile Recommen...Usability Assessment of a Context-Aware and Personality-Based Mobile Recommen...
Usability Assessment of a Context-Aware and Personality-Based Mobile Recommen...
 
Measuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systemsMeasuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systems
 
Causal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationCausal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous Optimization
 
Contextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsContextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender Systems
 
Beyond Churn Prediction : An Introduction to uplift modeling
Beyond Churn Prediction : An Introduction to uplift modelingBeyond Churn Prediction : An Introduction to uplift modeling
Beyond Churn Prediction : An Introduction to uplift modeling
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...
[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...
[Paper Review] Personalized Top-N Sequential Recommendation via Convolutional...
 
AIRG Presentation
AIRG PresentationAIRG Presentation
AIRG Presentation
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
Product Recommendations Enhanced with Reviews
Product Recommendations Enhanced with ReviewsProduct Recommendations Enhanced with Reviews
Product Recommendations Enhanced with Reviews
 
Predictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive IndustryPredictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive Industry
 
Group Project - Final - Linked in
Group Project - Final - Linked inGroup Project - Final - Linked in
Group Project - Final - Linked in
 
Weka linked in
Weka linked inWeka linked in
Weka linked in
 
Introduction to MaxDiff Scaling of Importance - Parametric Marketing Slides
Introduction to MaxDiff Scaling of Importance - Parametric Marketing SlidesIntroduction to MaxDiff Scaling of Importance - Parametric Marketing Slides
Introduction to MaxDiff Scaling of Importance - Parametric Marketing Slides
 
Make the Most of Your Time: How Should the Analyst Work with Automated Tracea...
Make the Most of Your Time: How Should the Analyst Work with Automated Tracea...Make the Most of Your Time: How Should the Analyst Work with Automated Tracea...
Make the Most of Your Time: How Should the Analyst Work with Automated Tracea...
 
Promise Keynote
Promise KeynotePromise Keynote
Promise Keynote
 

En vedette

RecSys Multi-Stack Ensemble for Job Recommendation, Pumpkin-Pie
RecSys Multi-Stack Ensemble for Job Recommendation, Pumpkin-PieRecSys Multi-Stack Ensemble for Job Recommendation, Pumpkin-Pie
RecSys Multi-Stack Ensemble for Job Recommendation, Pumpkin-PieTommaso Carpi
 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...David Zibriczky
 
Temporal Learning and Sequence Modeling for a Job Recommender System
Temporal Learning and Sequence Modeling for a Job Recommender SystemTemporal Learning and Sequence Modeling for a Job Recommender System
Temporal Learning and Sequence Modeling for a Job Recommender SystemAnoop Kumar
 
Jobandtalent at recsys challenge 2016
Jobandtalent at recsys challenge 2016Jobandtalent at recsys challenge 2016
Jobandtalent at recsys challenge 2016Oscar Huarte
 
Canopy kmeans
Canopy kmeansCanopy kmeans
Canopy kmeansnagwww
 
Rob Nelson - Ideology and algorithms: the uses of nationalism in the American...
Rob Nelson - Ideology and algorithms: the uses of nationalism in the American...Rob Nelson - Ideology and algorithms: the uses of nationalism in the American...
Rob Nelson - Ideology and algorithms: the uses of nationalism in the American...Digital History
 
StreamGrid: Summarization of large-scale Events using Topic Modeling and Temp...
StreamGrid: Summarization of large-scale Events using Topic Modeling and Temp...StreamGrid: Summarization of large-scale Events using Topic Modeling and Temp...
StreamGrid: Summarization of large-scale Events using Topic Modeling and Temp...Symeon Papadopoulos
 
allegrotech - Data science meetup #1 Intro
allegrotech - Data science  meetup #1 Introallegrotech - Data science  meetup #1 Intro
allegrotech - Data science meetup #1 IntroBartlomiej Twardowski
 
Topic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and ApplicationsTopic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and ApplicationsAyush Jain
 
Fabrikatyr lda topic modelling practical application
Fabrikatyr lda topic modelling practical applicationFabrikatyr lda topic modelling practical application
Fabrikatyr lda topic modelling practical applicationTim Carnus
 
Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...
Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...
Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...Alexis Perrier
 
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Daniele Di Mitri
 
Topic Modelling to identify behavioral trends in online communities
Topic Modelling to identify behavioral trends in online communities Topic Modelling to identify behavioral trends in online communities
Topic Modelling to identify behavioral trends in online communities Conor Duke
 
Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016
Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016
Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016Jonathan Sedar
 
Word2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad MahdaviWord2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad Mahdaviirpycon
 
An Introduction to gensim: "Topic Modelling for Humans"
An Introduction to gensim: "Topic Modelling for Humans"An Introduction to gensim: "Topic Modelling for Humans"
An Introduction to gensim: "Topic Modelling for Humans"sandinmyjoints
 

En vedette (20)

RecSys Multi-Stack Ensemble for Job Recommendation, Pumpkin-Pie
RecSys Multi-Stack Ensemble for Job Recommendation, Pumpkin-PieRecSys Multi-Stack Ensemble for Job Recommendation, Pumpkin-Pie
RecSys Multi-Stack Ensemble for Job Recommendation, Pumpkin-Pie
 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
 
Temporal Learning and Sequence Modeling for a Job Recommender System
Temporal Learning and Sequence Modeling for a Job Recommender SystemTemporal Learning and Sequence Modeling for a Job Recommender System
Temporal Learning and Sequence Modeling for a Job Recommender System
 
Jobandtalent at recsys challenge 2016
Jobandtalent at recsys challenge 2016Jobandtalent at recsys challenge 2016
Jobandtalent at recsys challenge 2016
 
Recruit recsys-review-magambo
Recruit recsys-review-magamboRecruit recsys-review-magambo
Recruit recsys-review-magambo
 
Canopy kmeans
Canopy kmeansCanopy kmeans
Canopy kmeans
 
Thesis_Nazarova_Final(1)
Thesis_Nazarova_Final(1)Thesis_Nazarova_Final(1)
Thesis_Nazarova_Final(1)
 
SocialLda
SocialLda SocialLda
SocialLda
 
Rob Nelson - Ideology and algorithms: the uses of nationalism in the American...
Rob Nelson - Ideology and algorithms: the uses of nationalism in the American...Rob Nelson - Ideology and algorithms: the uses of nationalism in the American...
Rob Nelson - Ideology and algorithms: the uses of nationalism in the American...
 
StreamGrid: Summarization of large-scale Events using Topic Modeling and Temp...
StreamGrid: Summarization of large-scale Events using Topic Modeling and Temp...StreamGrid: Summarization of large-scale Events using Topic Modeling and Temp...
StreamGrid: Summarization of large-scale Events using Topic Modeling and Temp...
 
allegrotech - Data science meetup #1 Intro
allegrotech - Data science  meetup #1 Introallegrotech - Data science  meetup #1 Intro
allegrotech - Data science meetup #1 Intro
 
Topic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and ApplicationsTopic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and Applications
 
Fabrikatyr lda topic modelling practical application
Fabrikatyr lda topic modelling practical applicationFabrikatyr lda topic modelling practical application
Fabrikatyr lda topic modelling practical application
 
Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...
Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...
Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...
 
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation
 
Topic Modelling to identify behavioral trends in online communities
Topic Modelling to identify behavioral trends in online communities Topic Modelling to identify behavioral trends in online communities
Topic Modelling to identify behavioral trends in online communities
 
Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016
Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016
Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016
 
Recsys 2016
Recsys 2016Recsys 2016
Recsys 2016
 
Word2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad MahdaviWord2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad Mahdavi
 
An Introduction to gensim: "Topic Modelling for Humans"
An Introduction to gensim: "Topic Modelling for Humans"An Introduction to gensim: "Topic Modelling for Humans"
An Introduction to gensim: "Topic Modelling for Humans"
 

Similaire à Avito recsys-challenge-2016RecSys Challenge 2016: Job Recommendation Based on Factorization Machines and Topic Modelling

Ice 2013-A Structured Team Building Method for Collaborative Crowdsourcing
Ice 2013-A Structured Team Building Method for Collaborative CrowdsourcingIce 2013-A Structured Team Building Method for Collaborative Crowdsourcing
Ice 2013-A Structured Team Building Method for Collaborative CrowdsourcingErre Quadro
 
Enterprise Applications of Text Intelligence - Lecture slides
Enterprise Applications of Text Intelligence - Lecture slidesEnterprise Applications of Text Intelligence - Lecture slides
Enterprise Applications of Text Intelligence - Lecture slidesUniversity St. Gallen
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEnrico Daga
 
Comparative analysis of national open data portals or whether your portal is ...
Comparative analysis of national open data portals or whether your portal is ...Comparative analysis of national open data portals or whether your portal is ...
Comparative analysis of national open data portals or whether your portal is ...Anastasija Nikiforova
 
IT Confidence 2013 - Spago4Q presents a 3D model for Productivity Intelligence
IT Confidence 2013 - Spago4Q presents a 3D model for Productivity IntelligenceIT Confidence 2013 - Spago4Q presents a 3D model for Productivity Intelligence
IT Confidence 2013 - Spago4Q presents a 3D model for Productivity IntelligenceSpagoWorld
 
TCI 2016 Business Upper Austria
TCI 2016 Business Upper AustriaTCI 2016 Business Upper Austria
TCI 2016 Business Upper AustriaTCI Network
 
Achieving sustainable development by integrating it into the business proces...
Achieving sustainable development by integrating it into the business proces...Achieving sustainable development by integrating it into the business proces...
Achieving sustainable development by integrating it into the business proces...Tomislav Rozman
 
Lak2018: Scaling Nationally: Seven Lesson Learned
Lak2018:  Scaling Nationally: Seven Lesson LearnedLak2018:  Scaling Nationally: Seven Lesson Learned
Lak2018: Scaling Nationally: Seven Lesson Learnedmwebbjisc
 
Phase 1 Learning Analytics Intro Slides
Phase 1 Learning Analytics Intro SlidesPhase 1 Learning Analytics Intro Slides
Phase 1 Learning Analytics Intro SlidesPaul Bailey
 
Predicting Employee Attrition using various techniques of Machine Learning
Predicting Employee Attrition using various techniques of Machine LearningPredicting Employee Attrition using various techniques of Machine Learning
Predicting Employee Attrition using various techniques of Machine LearningIRJET Journal
 
Measuring quality of developments in a large industrial software factory with...
Measuring quality of developments in a large industrial software factory with...Measuring quality of developments in a large industrial software factory with...
Measuring quality of developments in a large industrial software factory with...SpagoWorld
 
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...IRJET Journal
 
How to create self-service analytics tool from activity logs garbage
How to create self-service analytics tool from activity logs garbageHow to create self-service analytics tool from activity logs garbage
How to create self-service analytics tool from activity logs garbageAnton Anokhin
 
FLUX·3D - Forward Looking User eXperience
FLUX·3D - Forward Looking User eXperienceFLUX·3D - Forward Looking User eXperience
FLUX·3D - Forward Looking User eXperienceMario Guillo
 
Services to support FAIR data - Introduction
Services to support FAIR data - IntroductionServices to support FAIR data - Introduction
Services to support FAIR data - IntroductionEOSC-hub project
 
Day1 Sokwoo Rhee
Day1 Sokwoo RheeDay1 Sokwoo Rhee
Day1 Sokwoo RheeUS-Ignite
 

Similaire à Avito recsys-challenge-2016RecSys Challenge 2016: Job Recommendation Based on Factorization Machines and Topic Modelling (20)

Ice 2013-A Structured Team Building Method for Collaborative Crowdsourcing
Ice 2013-A Structured Team Building Method for Collaborative CrowdsourcingIce 2013-A Structured Team Building Method for Collaborative Crowdsourcing
Ice 2013-A Structured Team Building Method for Collaborative Crowdsourcing
 
Emotionalise me: Self-reporting and arousal measurements in virtual tourism e...
Emotionalise me: Self-reporting and arousal measurements in virtual tourism e...Emotionalise me: Self-reporting and arousal measurements in virtual tourism e...
Emotionalise me: Self-reporting and arousal measurements in virtual tourism e...
 
Enterprise Applications of Text Intelligence - Lecture slides
Enterprise Applications of Text Intelligence - Lecture slidesEnterprise Applications of Text Intelligence - Lecture slides
Enterprise Applications of Text Intelligence - Lecture slides
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data Cubes
 
Comparative analysis of national open data portals or whether your portal is ...
Comparative analysis of national open data portals or whether your portal is ...Comparative analysis of national open data portals or whether your portal is ...
Comparative analysis of national open data portals or whether your portal is ...
 
Bertazo et al - Application Lifecycle Management and process monitoring throu...
Bertazo et al - Application Lifecycle Management and process monitoring throu...Bertazo et al - Application Lifecycle Management and process monitoring throu...
Bertazo et al - Application Lifecycle Management and process monitoring throu...
 
IT Confidence 2013 - Spago4Q presents a 3D model for Productivity Intelligence
IT Confidence 2013 - Spago4Q presents a 3D model for Productivity IntelligenceIT Confidence 2013 - Spago4Q presents a 3D model for Productivity Intelligence
IT Confidence 2013 - Spago4Q presents a 3D model for Productivity Intelligence
 
Summary of pilot cases. New ways of working. Esa Nykänen, Jari Laarni, Hanna-...
Summary of pilot cases. New ways of working. Esa Nykänen, Jari Laarni, Hanna-...Summary of pilot cases. New ways of working. Esa Nykänen, Jari Laarni, Hanna-...
Summary of pilot cases. New ways of working. Esa Nykänen, Jari Laarni, Hanna-...
 
TCI 2016 Business Upper Austria
TCI 2016 Business Upper AustriaTCI 2016 Business Upper Austria
TCI 2016 Business Upper Austria
 
Achieving sustainable development by integrating it into the business proces...
Achieving sustainable development by integrating it into the business proces...Achieving sustainable development by integrating it into the business proces...
Achieving sustainable development by integrating it into the business proces...
 
Lak2018: Scaling Nationally: Seven Lesson Learned
Lak2018:  Scaling Nationally: Seven Lesson LearnedLak2018:  Scaling Nationally: Seven Lesson Learned
Lak2018: Scaling Nationally: Seven Lesson Learned
 
Phase 1 Learning Analytics Intro Slides
Phase 1 Learning Analytics Intro SlidesPhase 1 Learning Analytics Intro Slides
Phase 1 Learning Analytics Intro Slides
 
Predicting Employee Attrition using various techniques of Machine Learning
Predicting Employee Attrition using various techniques of Machine LearningPredicting Employee Attrition using various techniques of Machine Learning
Predicting Employee Attrition using various techniques of Machine Learning
 
Measuring quality of developments in a large industrial software factory with...
Measuring quality of developments in a large industrial software factory with...Measuring quality of developments in a large industrial software factory with...
Measuring quality of developments in a large industrial software factory with...
 
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
 
4.1.2nd WS. ETNA & Communication Technology R.Andrich
4.1.2nd WS. ETNA & Communication Technology R.Andrich4.1.2nd WS. ETNA & Communication Technology R.Andrich
4.1.2nd WS. ETNA & Communication Technology R.Andrich
 
How to create self-service analytics tool from activity logs garbage
How to create self-service analytics tool from activity logs garbageHow to create self-service analytics tool from activity logs garbage
How to create self-service analytics tool from activity logs garbage
 
FLUX·3D - Forward Looking User eXperience
FLUX·3D - Forward Looking User eXperienceFLUX·3D - Forward Looking User eXperience
FLUX·3D - Forward Looking User eXperience
 
Services to support FAIR data - Introduction
Services to support FAIR data - IntroductionServices to support FAIR data - Introduction
Services to support FAIR data - Introduction
 
Day1 Sokwoo Rhee
Day1 Sokwoo RheeDay1 Sokwoo Rhee
Day1 Sokwoo Rhee
 

Dernier

High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicinesherlingomez2
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oManavSingh202607
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONrouseeyyy
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 

Dernier (20)

High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 

Avito recsys-challenge-2016RecSys Challenge 2016: Job Recommendation Based on Factorization Machines and Topic Modelling

  • 1. © AvitoBasic elements guidelines. FILES: Avito-LOGO_RGB.eps, Avito-LOGO_CMYK_Pa RecSys Challenge 2016: Job Recommendation Based on Factorization Machines and Topic Modelling 7th place solution Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016
  • 2. Basic elements guidelines. Problem statement Data description ∙ Impressions — details about which items (job postings) were shown to which user by the existing recommender (19 August 2015 — 9 November 2015). ∙ Interactions — interactions that the user performed on the items (clicked, bookmarked, replied or deleted). ∙ Users — users details: job roles, career level, discipline, industry, location, experience, and education. ∙ Items — items details: title, career level, discipline, industry, location, employment type, tags, created time and flag if item was active during the test. Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 2 / 23
  • 3. Basic elements guidelines. Problem statement Data description: impressions and interactions Date interval: 2015-08-19 – 2015-11-09 Impressions ∙ 201M unique user-item-week tuples ∙ 2.7M unique users ∙ 846K unique items Interactions ∙ 8.8M events: clicked – 7.2M, deleted – 1.0M, replied – 422K, bookmarked – 206K ∙ 785K unique users ∙ 1.03M unique items ∙ 2.8M из 6.9M (user-item) pairs are in impressions Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 3 / 23
  • 4. Basic elements guidelines. Problem statement Data description: target users and items 150K users for making recommendations, from which: ∙ 39.7К (26.5%) have no events ∙ 59.5K (39.6%) have less than 2 events ∙ 70.6K (47.1%) have less than 3 events 327К active items, from which: ∙ 129К (39.5%) have no events ∙ 164K (50.1%) have less than 2 events ∙ 188K (57.6%) have less then 3 events Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 4 / 23
  • 5. Basic elements guidelines. Problem statement Task of the challenge score(R, ˆR) = ∑︁ u∈U 20(P2(ru, ˆru) + P4(ru, ˆru) + R30(ru, ˆru)+ +S30(ru, ˆru)) + 10(P6(ru, ˆru) + P20(ru, ˆru)), where U = {0, . . . , N − 1} – list of target users, R = {ru}u∈U – lists of relevant items, ˆR = {ˆru}u∈U – the solution, Pk(ru, ˆru) – precision at top k for user u, R30(ru, ˆru) – recall at top 30, S30(ru, ˆru) – user success. Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 5 / 23
  • 6. Basic elements guidelines. Problem statement Models validation ∙ The last week of interactions ∙ 10 000 random users from those who made any interactions during this week ∙ Old items (created more than a month ago) without interactions were removed ∙ Obtained score was highly correlated with the result on the Public Leaderboard Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 6 / 23
  • 7. Basic elements guidelines. Solution of the team Interesting insights from data ∙ A significant proportion of users and items have a small number of events or have no events. It means that we need to use a hybrid approach that takes into account not only collaborative filtering but the content data of items and users. ∙ Impressions slowly change over time. That is, the presence of a pair of user-item in impressions is a useful feature, and we use it as the separate model. ∙ Geographical features (distance, region, city, geoclusters etc.) are not improve score significantly. ∙ Tokens from user profiles and items are good features. Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 7 / 23
  • 8. Basic elements guidelines. Solution of the team Interesting insights from data User profile and sessions exampleCV experience_n_experiencediscipline_id_user country_useregion_user jobroles 5 or more entr 10‐15 yearsSales & Commerce Germany Bavaria ['962959', '283291', '502342'] SESSIONS created_at impressiondiscipline_id_item country_itemregion_item title tags 09‐01 1:27 1 Production & ManufactGermany Baden‐Württem['620383', '1118975'] ['102823', '1335184', '624061', '73234', '1604815', '2862074'  09‐01 1:27 1 Other Disciplines Germany Hamburg ['4572761', '3543754', '196892['993979', '2426818', '792504', '4425481', '494116', '976257'  09‐01 1:27 1 Other Disciplines Germany Berlin ['18091'] ['4198994', '4182900', '4354582', '1399193', '1377742', '3580  09‐02 0:21 1 Health, Medical & Socianon_dach not specified ['165415', '1986087', '2585795['2426818', '3726822', '792504', '1830721', '184797', '325622  09‐04 20:46 0 IT & Software DevelopmGermany Brandenburg ['655030'] ['1491612', '972718', '2426818', '2383555', '4483314', '43216  09‐08 20:16 1 Other Disciplines Germany Hamburg ['2915824', '4035399', '156769['2110329', '503870', '2426818', '1437930', '2245760', '35922  09‐08 22:40 1 Sales & Commerce Germany not specified ['3418410', '3413328'] ['686709', '2036672', '3794933', '502342', '3413328', '117856  09‐08 22:41 1 IT & Software Developmnon_dach not specified ['3408137'] ['2632767', '1491612', '2245760', '689679', '1565617', '43216  09‐09 22:58 1 Administration Germany Berlin ['4141254', '1118975'] ['4162864', '1491612', '1565617', '689679', '4204056', '15454  09‐09 23:00 0 Production & ManufactGermany Lower Saxony ['4454260', '502342'] ['543177', '4160943', '2501578', '4329775', '3085937', '23421  09‐09 23:01 0 Other Disciplines Germany Lower Saxony ['1567693', '568776'] ['1178568', '1248479', '370640', '2342166', '94890', '3794933  09‐09 23:02 1 Health, Medical & Socianon_dach not specified ['1567693'] ['1178568', '1565617', '1491612', '1601282', '2380081', '4941  09‐09 23:08 0 Finance, Accounting & CAustria not specified ['2865345', '3294368'] ['3391339', '3176219', '4499767', '2426818', '798840', '37295  09‐09 23:08 0 Production & ManufactGermany Lower Saxony ['494116'] ['3176219', '159096', '3391339', '4499767', '494116', '372957  09‐09 23:08 0 Health, Medical & SociaSwitzerland not specified ['128836', '1836819'] ['1798728', '128836', '675557', '2976021']   09‐09 23:09 0 Production & ManufactAustria not specified ['2846960', '76751', '4227194',['2846960', '2632767', '3872048', '3939477', '1469275', '1695  09‐09 23:09 0 Engineering & TechnicaGermany Berlin ['4141254'] ['749243', '362736', '692505', '3669898', '624061', '494116']   09‐10 0:59 1 Production & ManufactAustria not specified ['1118975', '3478136'] ['502342', '4151211', '4439048', '3210328', '624061', '313759  09‐10 0:59 1 Engineering & TechnicaGermany Bavaria ['128836', '76887'] ['816406', '3347566', '502342', '4425481', '4160943', '160481  09‐10 0:59 1 Engineering & TechnicaGermany North Rhine‐W ['1119117', '3705605', '347813['2896178', '1357922', '2031982', '1491612', '1830721', '1335  09‐10 1:00 0 Other Disciplines Austria not specified ['2915824', '4035399', '156769['1565617', '1496767', '82994', '1625244', '1941434', '123188  09‐10 1:00 1 Teaching, R&D Germany North Rhine‐W ['1986087'] ['3144475', '4245173', '3096790', '655817', '2969837', '43216  09‐10 1:00 0 Other Disciplines Germany not specified ['2573697', '4035399', '448435['3457262', '3658040', '2126708', '2110329', '2630003', '4017  09‐10 1:00 0 Engineering & TechnicaGermany Baden‐Württem['2140778', '3241763'] ['1734724', '2000691', '4425481', '2111897', '577140', '94890  09‐10 1:00 1 Management & Corpor Germany Bavaria ['494116', '1119117', '2387379['4245173', '1231885', '272304', '4140111', '4321623', '18307  Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 8 / 23
  • 9. Basic elements guidelines. Solution of the team Item-based collaborative filtering Similarity metrics: ∙ Jaccard ∙ Cosine ∙ Pearson Event types for training: ∙ All Positive interactions ∙ Only Click interactions ∙ Impressions Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 9 / 23
  • 10. Basic elements guidelines. Solution of the team Factorization Machines Predicted score for user i on item j is given by: p(i,j) = 𝜇 + wi + wj + aT xi + bT yj + uT i vj , where 𝜇 – a global bias term, wi and wj are weight terms for user i and item j respectively, xi and yj are the user and item side feature vectors, a and b are the weight vectors for those side features, ui and vj – latent factors, which are vectors of fixed length (number of factors is a parameter). Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 10 / 23
  • 11. Basic elements guidelines. Solution of the team Factorization Machines: main parameters ∙ Number of latent factors (30 – 400) ∙ Number of sampled negative examples (1 – 12) ∙ Maximum number of iterations (25 – 70) ∙ Regularization parameters (1e-9 – 1e-7) ∙ User and item side features Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 11 / 23
  • 12. Basic elements guidelines. Solution of the team Factorization Machines: side features Users - all features (OneHotEncoder) ∙ jobroles ∙ career_level, discipline_id, industry_id ∙ country, region ∙ experience: n_entries_class, years, years_in_current ∙ edu: degree, field_of_studies Items - all features, except latitude and longitude (OneHotEncoder) ∙ title, tags ∙ career_level, discipline_id, industry_id ∙ country, region ∙ employment_type Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 12 / 23
  • 13. Basic elements guidelines. Solution of the team Topic model: Latent Semantic Indexing (LSI) ∙ Let document associated with each user be all title and tags tokens of items, which the user interacted with and job roles tokens from user description. ∙ Convert each document into a token occurrences vector. ∙ Transform values in each vector to TF-IDF statistics and combine all vectors into a large token-document matrix. ∙ Then we apply Singular Value Decomposition (SVD) technique on the token-document matrix ∙ The similarity between user and item will be the similarity between corresponding latent vectors. Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 13 / 23
  • 14. Basic elements guidelines. Solution of the team Solution framework Initial dataset Item­based models FM models Topic model Blending Output Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 14 / 23
  • 15. Basic elements guidelines. Solution of the team Linear Ensemble Base models: FR0 SIM0 PI Local Score 1 0 0 76995 0 1 0 69622 0 0 1 104495 1 1 1 132505 ∙ SIM0 – Item-based Recommender (jaccard similarity) ∙ FR0 – Factorization Machines Recommender (400 factors) ∙ PI – Past Impressions Recommender (very simple model with binary output) ∙ Local Score: 10 000 random users who made interactions with items during last week ∙ The score on the Public Leaderboard ≈ 3.8 × the score on our Local Validation Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 15 / 23
  • 16. Basic elements guidelines. Solution of the team Linear Ensemble Version «zero»: FR0 SIM0 PI Local Score 1 2 1 134285 The first version: FR0 SIM0 FR8.0 0 * SIM0 PI Local Score 1 13 8 1 138073 The second version: FR1 SIM0 FR8.0 1 * SIM0 PI Local Score 1 13 8 1 140876 FR1 = FR_f 100_i25 Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 16 / 23
  • 17. Basic elements guidelines. Solution of the team Linear Ensemble The third version: FR2 SIM2 FR8.0 2 * SIM2 PI Local Score 1 13 8 1 143653 FR2 = 0.5*FR_f 100_i25 + 0.5*FR_f 400_i70 SIM2 = 0.5*SIM_jac + 0.5*SIM_click Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 17 / 23
  • 18. Basic elements guidelines. Solution of the team Linear Ensemble Local Score (145841): 1.0*FR3 + 15.0 * (FR8.0 3 * SIM3) + 13.0 * SIM3 + 1.0 * PI - 0.5 * SIM_pearson - 0.3 * FR_f 400_i50_no_side + 0.5 * (FR_imp2.0 * SIM_imp), where FR3 = 0.5*FR_f 100_i25 + 0.5*FR_f 400_i70 SIM3 = SIM_click Local Score (146569): 1.0*FR4 + 15.0 * (FR8.0 4 * SIM4) + 13.0 * SIM4 + 1.0 * PI - 0.4 * SIM_pearson - 0.3 * FR_f 400_i50_no_side + 0.5 * (FR_imp2.0 * SIM_imp) + 0.2 * TM, where FR4 = 0.5*FR_f 100_i25 + 0.5*FR_f 400_i70 SIM4 = 0.4*SIM_jac + 0.6*SIM_t = 1 Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 18 / 23
  • 19. Basic elements guidelines. Solution of the team Final models set SIM_jac Item-based jaccard similarity SIM_click Item-based jaccard similarity on clicks SIM_pearson Item-based pearson similarity SIM_imp Item-based jaccard similarity on impressions FR_f100_i25 Factorization, n_factors=100, iter=25 FR_f400_i70 Factorization, n_factors=400, iter=70 FR_f400_i50_no_side Factorization, no side data FR_imp Factorization on impressions TM LSI topic model PI Past Impressions Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 19 / 23
  • 20. Basic elements guidelines. Solution of the team Hardware&Software ∙ 1 server: 28 cores, 56 threads, 256Gb RAM ∙ Full training + prediction = 16 hours ∙ All code was written in Python ∙ ML Libraries: graphlab, gensim Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 20 / 23
  • 21. Basic elements guidelines. Leaderboard Submission history Score Rank Name Date 554655 9 Topic model 100 factors 06/27/16 548366 8 Top 150 candidates from every model 06/25/16 543284 8 8 model set: 4FR + 4SIM + TM 06/24/16 537157 9 Topic model 06/23/16 530599 10 3 models set: FR + 2SIM 06/23/16 497136 15 3 models set: FR + 2SIM 06/22/16 496241 1 FR with side data 03/20/16 397604 1 Past Impressions model 03/11/16 132790 1 Simple item-based recommender 03/10/16 Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 21 / 23
  • 22. Basic elements guidelines. Leaderboard Results Rank Team Leaderboard Score Full Score 1 YunOS-OneSearch 681707.38 2052185.54 2 mim-solutions 675985.03 2035964.16 3 DaveXster 665592.06 2005263.73 4 PumpkinPie 622408.55 1866477.77 5 milk tea 613125.21 1846420.12 6 mdr_rec 605048.58 1823472.31 7 Avito 554654.72 1677898.52 8 recometric 556133.18 1677233.84 9 nodalpoints 555483.39 1671812.08 10 lucky_dog 542213.51 1632828.82 21 XING_TELECOM 461000.32 1397030.74 Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 22 / 23
  • 23. Thank you! Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 23 / 23