1. Improving IP Geolocation using Query Logs
Ovidiu Dan, Vaibhav Parikh and Brian D.Davison
Lehigh University, Bethlehem, PA, USA
Microsoft Bing, Redmond, WA, USA
1
2. Outline
1- Introduction
2- Problem statement
3- Previous Work
4- Experiments
5- Results and conlusion
6- Criticism
This template is free to use under Creative Commons Attribution license. If you use the graphic assets (photos,
icons and typographies) provided with this presentation you must keep the Credits slide.
2
3. Hello!
I am Mahdi Atawneh
You can find me at:
@mshanak
mahdi@ppu.edu
3
5. ▷ IP Geolocation database: used to map IP address
to their geographical location.
Introduction
Start IP End IP Country State City
1672213 1678654 Palestine Palestine Hebron
4123455 4321232 Jordan Jordan Amman
5
7. IP Geolocation database used for:
1. Content delivery networks (which direct the user to the closets
server).
2. Credit card fraud protection.
3. Advertisements.
4. Ecommerce .
5. location based licensing ( like youtube.com, Netflix ).
Introduction
7
9. Related Work
Methods used previusly to generate IP geolocation database:
1. Network delay and toplogy.
2. web mining.
9
10. Related Work
1- network delay and topology:
relays on the observation of delay of network packets as they
travel between two Internet hosts to the distance between the
hosts.
10
11. Related Work
1- network delay and topology limitations :
• This method need access to hosts spread throughout
the globe to perform measurement.
• Not all networks support ICMP pings,
• Errors could be ten of hundreds of kilometers.
11
12. Related Work
2- web mining
This method use information gathered
from the web, it extracte locations
mentioned in web pages and assigned
them the IP of the server which host the
content.
12
13. Related Work
2- web mining limitations
This method focuses on the location of the server not the
end user.
13
15. Contributions
1. Propose a technique to generate IP geolocation ground truth
data using real time location information.
2. Study the impact of incorrect IP geolocation on user behavior.
3. Evaluate the accuracy of three IP geolocation databases.
4. Propose a preliminary method to improve IP geolocation db.
5. Run experiment to validate improvements to an IP geolocation
database.
15
16. Contributions
1. Propose a technique to generate IP geolocation ground truth
data using real time location information.
2. Study the impact of incorrect IP geolocation on user behavior.
3. Evaluate the accuracy of three IP geolocation databases.
4. Propose a preliminary method to improve IP geolocation db.
5. Run experiment to validate improvements to an IP geolocation
database.
16
17. 1. Ground Truth
• The problem of the previous work is the limited number of
IP's in the ground truth.
• In this paper , they used a log files of Bing (Microsoft) search
engine which includes the real location of the users who use
Bing through their mobile.
17
19. 1. Ground Truth
They performed many filtering steps on the log data:
• Ensue that most IP addresses are from fixed broadband
connections.
• Each IP has single location.
19
20. 1. Ground Truth
Result:
• a ground truth of 8.4 million IP address with the real-time
location.
• The set spans 220 countries.
20
21. Contributions
1. Propose a technique to generate IP geolocation ground truth
data using real time location information.
2. Study the impact of incorrect IP geolocation on user behavior.
3. Evaluate the accuracy of three IP geolocation databases.
4. Propose a preliminary method to improve IP geolocation db.
5. Run experiment to validate improvements to an IP geolocation
database.
21
22. 2. Impact of incorrect IP geolocation
Study the behavior of the users of Bing search engine
• for 7 days
• across all devices
To figure out the impact of incorrect location result.
22
23. 2. Impact of incorrect IP geolocation
Results:
23
24. Contributions
1. Propose a technique to generate IP geolocation ground truth
data using real time location information.
2. Study the impact of incorrect IP geolocation on user behavior.
3. Evaluate the accuracy of three IP geolocation databases.
4. Propose a preliminary method to improve IP geolocation db.
5. Run experiment to validate improvements to an IP geolocation
database.
24
25. 3. Evaluating IP geolocation Databases
• The authors compared the top three commercial ip
geolocation database (Vendor A, Vendor B, Vendor C).
25
26. 3. Evaluating IP geolocation Databases
Result
• they found that none
of the three Vendors
achieved accuracy
above 70% at the city
level.
• Vendor C outperform
the other.
26
27. Contributions
1. Propose a technique to generate IP geolocation ground truth
data using real time location information.
2. Study the impact of incorrect IP geolocation on user behavior.
3. Evaluate the accuracy of three IP geolocation databases.
4. Propose a preliminary method to improve IP geolocation db.
5. Run experiment to validate improvements to an IP geolocation
database.
27
28. 4. Improve IP geolocation db
• The authors want to improve existing database
instead of creating new one.
• They used the locations extracted from the user query.
28
30. 4. Improve IP geolocation db
Datasets
1. Main query log ( it contains 180 days of Bing search engine
query logs).
2. Validation query logs (30 days of Bing logs collected before
main query logs),
3. Baseline: used the three vendors mentioned earlier.
4. ground truth: contain 8.4 million IP with there locations.
30
31. 4. Improve IP geolocation db
Approach:
They propose improving IP geolocation databases by correcting
the location of certain IP ranges using cities extracted from
user queries.
31
32. 4. Improve IP geolocation db
Approach: steps
1. Extract queries
2. Filter impressions
3. Extract locations.
4. Reverse geocode locations ( Bing API )
5. Aggregate locations.
6. Compute the popularity of each location.
32
33. 4. Improve IP geolocation db
Approach: steps
7. Score the location candidates in each IP range.
8. Decide whether to keep the original location or modify it
based on queries.
9. Test the modified geolocation database against the ground
truth
34
35. Contributions
1. Propose a technique to generate IP geolocation ground truth
data using real time location information.
2. Study the impact of incorrect IP geolocation on user behavior.
3. Evaluate the accuracy of three IP geolocation databases.
4. Propose a preliminary method to improve IP geolocation db.
5. Run experiment to validate improvements to an IP geolocation
database.
35
36. 5. Validate improvements
They carried out experiment on Bing Search engine :
• 7 days
• 850,000 unique users
• 1.6 query
• targeted Mexico market
36
38. Criticism
• The authors discussed in details the real-time
locations , but didn’t use it in their improvements.
• Many repeated ideas with there discussion in the
paper.
38