3. 3
What is ranking?
3
Main algorithm in search engine
Based on ML algorithms
Computes relevance score for query-document pair
The most kept secret of search companies
Today ranking quality depends on
Evaluation of ranking quality
A method of Data Set construction
Features of search engine
ML algorithm
4. 4
How to evaluate ranking quality?
4
Classical approach
Classical
Classical approach
Select set of queries 𝑄 = {𝑞1, 𝑞2, … , 𝑞|𝑄|} from logs
For each 𝑞 ∈ 𝑄 ∃ set of documents
𝑞 → 𝐷 = {𝑑1, 𝑑2, … , 𝑑 𝑁 𝑞
}
For each (𝑞, 𝑑) ask experts for mark ∈ {0,1,2,3,4,5}
Discount Cumulative Gain
𝑫𝑪𝑮 =
𝟐 𝒓𝒆𝒍 𝒊 − 𝟏
log 𝟐 𝒊 + 𝟏
𝑁 𝑞
𝒊=𝟏𝒒∈𝑸
5. 5
How to evaluate ranking quality with clickthrough
data?
5
Evaluation with absolute metrics
Users were shown results from different rankings
Measure statistics about user responses
• Abandonment rate
• Reformulation rate
• Position of first click
• Time to first click
• Etc.
Evaluation using Paired Comparisons
Show a combination of results from 2 ranking
Infer relative preferences
• Balanced interleaving
• Team-draft interleaving
• Etc.
8. 8
Typical problems of the classical approach
8
Problems with documents
Search index is constantly changing we have to rebuild
ranking model often.
Problems with experts
Experts do mistakes
Group of experts not equal millions of users
Experts do not ask queries
We fit ranking for instructions(100 pages), not for users
Problems with queries
Queries become irrelevant
Ratings always outdated
9. 9
Advantages and disadvantages of clickthrough
data
9
9
Expert judgements Clickthrough data
Thousands per day Millions per day
Expensive Cheap
Low speed of obtaining High speed of obtaining
Noisy data Extremely noisy data
Fresh only at the moment of
assessment
Always fresh data
Can evaluate any query (not
always correct)
Can’t evaluate queries that
nobody asks in SE
Judgements are biased Unbiased (in terms of our flow
of queries)
10. How we can use clickthrough data for
optimizing TDI?
10
Simple approach
SERP 1 SERP 2
vs
From 2 rankings select only serps, that win on TDI experiment
11. 11
Optimal SERP construction
11
11
Given
Query q
Set of documents for q
𝑞 → 𝐷 = {𝑑1, 𝑑2, … , 𝑑 𝑁 𝑞
}
User sessions with different permutations of docs from set D
Idea
Let`s construct permutation (optimal permutation - OP) of docs that will win
any other permutation of these documents in terms of TDI experiments in
average
12. 12
Information from user session
12
12
Example (Case 1)
query q
1. url1
2. url2
3. url3
4. url4
5. url5
6. url6
7. url7
8. url8
9. url9
10. url10
CLICK
What information have we received from this session?
13. 13
Information from user session
13
13
Example (Case 1)
query q
1. url1
2. url2
3. url3
4. url4
5. url5
6. url6
7. url7
8. url8
9. url9
10. url10
CLICK
𝑢𝑟𝑙1 >
𝑢𝑟𝑙2
𝑢𝑟𝑙3
𝑢𝑟𝑙4
𝑢𝑟𝑙5
𝑢𝑟𝑙6
𝑢𝑟𝑙7
𝑢𝑟𝑙8
𝑢𝑟𝑙9
𝑢𝑟𝑙10
Remark:
It is obvious that it is possible to use more
complex click model (CCM, DBN, etc.)
14. 14
Information from user session
14
14
Example (Case 2)
query q
1. url1
2. url2
3. url3
4. url4
5. url5
6. url6
7. url7
8. url8
9. url9
10. url10
What information have we received from this session?
CLICK
CLICK
CLICK
20. 20
Results
20
20
Computed Optimized Serps
for 200000 most frequent queries (7% of flow of queries)
+14% quality for these frequent queries
+1% search quality
NOT BAD
Let`s try use Optimized Serps for machine learning to rank
Amount of statistics
22. 22
Learning from top results
22
Problems with learning from top results (Example)
23. 23
Learning from top results
23
Problems with learning from top results
Out of top there are many documents with quite another features distribution
In all documents word “barcelona” there is in title. Therefore feature, that describes
availability words of query in title will be useless for this query.
Solution
Let`s sample from set
of unlabeled urls
We need sampling,
because we can`t add
all unlabeled data to
training data
………
Urls, that should be on top
Unlabeled urls
24. 24
Semi-supervised learning to rank
24
Sampling from unlabeled urls
………
Unlabeled docs Build self organizing map Get one doc from each cluster
Sampled url
Sampled url
Sampled url
Sampled url
Sampled url
25. 25
Semi-supervised learning to rank
25
Add sampled docs as “irrelevant” to training set
Sampled url
Sampled url
………
Sampled url
Unlabeled urlsFinal training data for query q
26. Train data set
Semi-supervised learning to rank
25
2626
Training data for query 𝑞1 Training data for query 𝑞2 Training data for query 𝑞|𝑄|
…..
Optimized Serp urls
Unlabeled urls (marked as irrelevant)
28. Final Results
27
We received the automatic search improvement method
This method can learn improved ranking function without any explicit
feedback from experts
timeline
TDI experiment with our old ranking, based on expert judgments
0
-0.01
0.01
0.02
0.03
0.04
0.05
30. 30
Using clickthrough data for
online learning to rank
29
Typical problems with new ranking formula construction
We need large dataset (5-10 millions points)
Usually we use active learning for obtaining this data
It is necessary about 10-15 iterations of active learning for obtaining
new ranking formula with same quality as current model
We can`t use all available clickthrough data for training out ranking formula
Can we improve current formula using new clickthrough data?
Can we improve current formula using ALL new clickthrough data?
31. 31
Typical ranking formula
30
Typical ranking formula specification
Ensemble of tens of thousands decision trees
Trained using gradient boosting algorithm
33. 33
Typical ranking formula
32
Typical ranking formula specification
Ranking formula can return only finite set of values
Each decision tree in ensemble contains only several predicates
Each query-document pair is described by aggregate of predicates of ensemble
Let`s use partition of multidimensional space generated
by ranking formula as clustering
Let`s remap all clickthrough data on this clusterization
36. 36
Online learning to rank results
35
Online learning to rank
We get online learning to rank method
Method allows us to use ALL clickthrough feedback from users
We don`t need to retrain model
Method allows to actualize current ranking formula
under current users behavior