SlideShare une entreprise Scribd logo
1  sur  39
Improving IP Geolocation using Query Logs
Ovidiu Dan, Vaibhav Parikh and Brian D.Davison
Lehigh University, Bethlehem, PA, USA
Microsoft Bing, Redmond, WA, USA
1
Outline
1- Introduction
2- Problem statement
3- Previous Work
4- Experiments
5- Results and conlusion
6- Criticism
This template is free to use under Creative Commons Attribution license. If you use the graphic assets (photos,
icons and typographies) provided with this presentation you must keep the Credits slide.
2
Hello!
I am Mahdi Atawneh
You can find me at:
@mshanak
mahdi@ppu.edu
3
1.
Introduction
4
▷ IP Geolocation database: used to map IP address
to their geographical location.
Introduction
Start IP End IP Country State City
1672213 1678654 Palestine Palestine Hebron
4123455 4321232 Jordan Jordan Amman
5
6
IP Geolocation database used for:
1. Content delivery networks (which direct the user to the closets
server).
2. Credit card fraud protection.
3. Advertisements.
4. Ecommerce .
5. location based licensing ( like youtube.com, Netflix ).
Introduction
7
2.
Related work
8
Related Work
Methods used previusly to generate IP geolocation database:
1. Network delay and toplogy.
2. web mining.
9
Related Work
1- network delay and topology:
relays on the observation of delay of network packets as they
travel between two Internet hosts to the distance between the
hosts.
10
Related Work
1- network delay and topology limitations :
• This method need access to hosts spread throughout
the globe to perform measurement.
• Not all networks support ICMP pings,
• Errors could be ten of hundreds of kilometers.
11
Related Work
2- web mining
This method use information gathered
from the web, it extracte locations
mentioned in web pages and assigned
them the IP of the server which host the
content.
12
Related Work
2- web mining limitations
This method focuses on the location of the server not the
end user.
13
2.
Contribution
14
Contributions
1. Propose a technique to generate IP geolocation ground truth
data using real time location information.
2. Study the impact of incorrect IP geolocation on user behavior.
3. Evaluate the accuracy of three IP geolocation databases.
4. Propose a preliminary method to improve IP geolocation db.
5. Run experiment to validate improvements to an IP geolocation
database.
15
Contributions
1. Propose a technique to generate IP geolocation ground truth
data using real time location information.
2. Study the impact of incorrect IP geolocation on user behavior.
3. Evaluate the accuracy of three IP geolocation databases.
4. Propose a preliminary method to improve IP geolocation db.
5. Run experiment to validate improvements to an IP geolocation
database.
16
1. Ground Truth
• The problem of the previous work is the limited number of
IP's in the ground truth.
• In this paper , they used a log files of Bing (Microsoft) search
engine which includes the real location of the users who use
Bing through their mobile.
17
1. Ground Truth
▷Log file example
18
1. Ground Truth
They performed many filtering steps on the log data:
• Ensue that most IP addresses are from fixed broadband
connections.
• Each IP has single location.
19
1. Ground Truth
Result:
• a ground truth of 8.4 million IP address with the real-time
location.
• The set spans 220 countries.
20
Contributions
1. Propose a technique to generate IP geolocation ground truth
data using real time location information.
2. Study the impact of incorrect IP geolocation on user behavior.
3. Evaluate the accuracy of three IP geolocation databases.
4. Propose a preliminary method to improve IP geolocation db.
5. Run experiment to validate improvements to an IP geolocation
database.
21
2. Impact of incorrect IP geolocation
Study the behavior of the users of Bing search engine
• for 7 days
• across all devices
To figure out the impact of incorrect location result.
22
2. Impact of incorrect IP geolocation
Results:
23
Contributions
1. Propose a technique to generate IP geolocation ground truth
data using real time location information.
2. Study the impact of incorrect IP geolocation on user behavior.
3. Evaluate the accuracy of three IP geolocation databases.
4. Propose a preliminary method to improve IP geolocation db.
5. Run experiment to validate improvements to an IP geolocation
database.
24
3. Evaluating IP geolocation Databases
• The authors compared the top three commercial ip
geolocation database (Vendor A, Vendor B, Vendor C).
25
3. Evaluating IP geolocation Databases
Result
• they found that none
of the three Vendors
achieved accuracy
above 70% at the city
level.
• Vendor C outperform
the other.
26
Contributions
1. Propose a technique to generate IP geolocation ground truth
data using real time location information.
2. Study the impact of incorrect IP geolocation on user behavior.
3. Evaluate the accuracy of three IP geolocation databases.
4. Propose a preliminary method to improve IP geolocation db.
5. Run experiment to validate improvements to an IP geolocation
database.
27
4. Improve IP geolocation db
• The authors want to improve existing database
instead of creating new one.
• They used the locations extracted from the user query.
28
4. Improve IP geolocation db
29
4. Improve IP geolocation db
Datasets
1. Main query log ( it contains 180 days of Bing search engine
query logs).
2. Validation query logs (30 days of Bing logs collected before
main query logs),
3. Baseline: used the three vendors mentioned earlier.
4. ground truth: contain 8.4 million IP with there locations.
30
4. Improve IP geolocation db
Approach:
They propose improving IP geolocation databases by correcting
the location of certain IP ranges using cities extracted from
user queries.
31
4. Improve IP geolocation db
Approach: steps
1. Extract queries
2. Filter impressions
3. Extract locations.
4. Reverse geocode locations ( Bing API )
5. Aggregate locations.
6. Compute the popularity of each location.
32
4. Improve IP geolocation db
Approach: steps
7. Score the location candidates in each IP range.
8. Decide whether to keep the original location or modify it
based on queries.
9. Test the modified geolocation database against the ground
truth
34
4. Improve IP geolocation db
Results:
34
Contributions
1. Propose a technique to generate IP geolocation ground truth
data using real time location information.
2. Study the impact of incorrect IP geolocation on user behavior.
3. Evaluate the accuracy of three IP geolocation databases.
4. Propose a preliminary method to improve IP geolocation db.
5. Run experiment to validate improvements to an IP geolocation
database.
35
5. Validate improvements
They carried out experiment on Bing Search engine :
• 7 days
• 850,000 unique users
• 1.6 query
• targeted Mexico market
36
5. Validate improvements
The results:
37
Criticism
• The authors discussed in details the real-time
locations , but didn’t use it in their improvements.
• Many repeated ideas with there discussion in the
paper.
38
“Thanks

Contenu connexe

Similaire à Improving ip geolocation using query logs

Games Industry Analytics Forum 2 - Plumbee
Games Industry Analytics Forum 2 - PlumbeeGames Industry Analytics Forum 2 - Plumbee
Games Industry Analytics Forum 2 - PlumbeeGIAF
 
AI Food detector; A model of Generative adversarial network for food Classifier
AI Food detector; A model of Generative adversarial network for food ClassifierAI Food detector; A model of Generative adversarial network for food Classifier
AI Food detector; A model of Generative adversarial network for food Classifierjimmy majumder
 
From Search Engines to Augmented Search Services
From Search Engines to Augmented Search ServicesFrom Search Engines to Augmented Search Services
From Search Engines to Augmented Search ServicesGabriela Bosetti
 
RDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation PruningRDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation Pruningwajrcs
 
Data quality is more important than you think
Data quality is more important than you thinkData quality is more important than you think
Data quality is more important than you thinkAmine Bendahmane
 
IDNIC Update
IDNIC UpdateIDNIC Update
IDNIC UpdateAPNIC
 
Introduction to Open Source GIS
Introduction to Open Source GISIntroduction to Open Source GIS
Introduction to Open Source GISSANGHEE SHIN
 
Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Nicolas Poggi
 
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web CrawlerIRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web CrawlerIRJET Journal
 
IRJET - Visual E-Commerce Application using Deep Learning
IRJET - Visual E-Commerce Application using Deep LearningIRJET - Visual E-Commerce Application using Deep Learning
IRJET - Visual E-Commerce Application using Deep LearningIRJET Journal
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...predictionio
 
Archwilio presentation mspicer
Archwilio presentation   mspicerArchwilio presentation   mspicer
Archwilio presentation mspicerMartin Spicer
 
Vietnam IPv6 Readiness Measurement, by Nguyen Tien Dzung [APRICOT 2015]
Vietnam IPv6 Readiness Measurement, by Nguyen Tien Dzung [APRICOT 2015]Vietnam IPv6 Readiness Measurement, by Nguyen Tien Dzung [APRICOT 2015]
Vietnam IPv6 Readiness Measurement, by Nguyen Tien Dzung [APRICOT 2015]APNIC
 
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...Neo4j
 
Soa4 all technical achievements final
Soa4 all technical achievements finalSoa4 all technical achievements final
Soa4 all technical achievements finalJohn Domingue
 

Similaire à Improving ip geolocation using query logs (20)

Games Industry Analytics Forum 2 - Plumbee
Games Industry Analytics Forum 2 - PlumbeeGames Industry Analytics Forum 2 - Plumbee
Games Industry Analytics Forum 2 - Plumbee
 
Ip address lookup Bit Shuffle Trie
Ip address lookup Bit Shuffle TrieIp address lookup Bit Shuffle Trie
Ip address lookup Bit Shuffle Trie
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
AI Food detector; A model of Generative adversarial network for food Classifier
AI Food detector; A model of Generative adversarial network for food ClassifierAI Food detector; A model of Generative adversarial network for food Classifier
AI Food detector; A model of Generative adversarial network for food Classifier
 
RESUME_RAVI
RESUME_RAVIRESUME_RAVI
RESUME_RAVI
 
From Search Engines to Augmented Search Services
From Search Engines to Augmented Search ServicesFrom Search Engines to Augmented Search Services
From Search Engines to Augmented Search Services
 
RDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation PruningRDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation Pruning
 
Data quality is more important than you think
Data quality is more important than you thinkData quality is more important than you think
Data quality is more important than you think
 
IDNIC Update
IDNIC UpdateIDNIC Update
IDNIC Update
 
Introduction to Open Source GIS
Introduction to Open Source GISIntroduction to Open Source GIS
Introduction to Open Source GIS
 
Binocular Search Engine Using Android
Binocular Search Engine Using AndroidBinocular Search Engine Using Android
Binocular Search Engine Using Android
 
Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)
 
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web CrawlerIRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
 
IRJET - Visual E-Commerce Application using Deep Learning
IRJET - Visual E-Commerce Application using Deep LearningIRJET - Visual E-Commerce Application using Deep Learning
IRJET - Visual E-Commerce Application using Deep Learning
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...
 
Archwilio presentation mspicer
Archwilio presentation   mspicerArchwilio presentation   mspicer
Archwilio presentation mspicer
 
Traffic Violations Detector using object detection -part2
Traffic Violations Detector using object detection -part2Traffic Violations Detector using object detection -part2
Traffic Violations Detector using object detection -part2
 
Vietnam IPv6 Readiness Measurement, by Nguyen Tien Dzung [APRICOT 2015]
Vietnam IPv6 Readiness Measurement, by Nguyen Tien Dzung [APRICOT 2015]Vietnam IPv6 Readiness Measurement, by Nguyen Tien Dzung [APRICOT 2015]
Vietnam IPv6 Readiness Measurement, by Nguyen Tien Dzung [APRICOT 2015]
 
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
 
Soa4 all technical achievements final
Soa4 all technical achievements finalSoa4 all technical achievements final
Soa4 all technical achievements final
 

Plus de Mahdi Atawneh

Optimized index structures for querying rdf from the web
Optimized index structures for querying rdf from the webOptimized index structures for querying rdf from the web
Optimized index structures for querying rdf from the webMahdi Atawneh
 
Improvement of shortest path algorithms using subgraphs heuristics
Improvement of shortest path algorithms using subgraphs heuristicsImprovement of shortest path algorithms using subgraphs heuristics
Improvement of shortest path algorithms using subgraphs heuristicsMahdi Atawneh
 
SILT: A Memory-Efficient, High-Performance Key-Value Store
SILT: A Memory-Efficient, High-Performance Key-Value StoreSILT: A Memory-Efficient, High-Performance Key-Value Store
SILT: A Memory-Efficient, High-Performance Key-Value StoreMahdi Atawneh
 
OWL reasoning with WebPIE: calculating the closer of 100 billion triples
OWL reasoning with WebPIE: calculating the closer of 100 billion triplesOWL reasoning with WebPIE: calculating the closer of 100 billion triples
OWL reasoning with WebPIE: calculating the closer of 100 billion triplesMahdi Atawneh
 
Bat algorithm explained. slides ppt pptx
Bat algorithm explained. slides ppt pptxBat algorithm explained. slides ppt pptx
Bat algorithm explained. slides ppt pptxMahdi Atawneh
 
01 nosql and multi model database
01   nosql and multi model database01   nosql and multi model database
01 nosql and multi model databaseMahdi Atawneh
 

Plus de Mahdi Atawneh (6)

Optimized index structures for querying rdf from the web
Optimized index structures for querying rdf from the webOptimized index structures for querying rdf from the web
Optimized index structures for querying rdf from the web
 
Improvement of shortest path algorithms using subgraphs heuristics
Improvement of shortest path algorithms using subgraphs heuristicsImprovement of shortest path algorithms using subgraphs heuristics
Improvement of shortest path algorithms using subgraphs heuristics
 
SILT: A Memory-Efficient, High-Performance Key-Value Store
SILT: A Memory-Efficient, High-Performance Key-Value StoreSILT: A Memory-Efficient, High-Performance Key-Value Store
SILT: A Memory-Efficient, High-Performance Key-Value Store
 
OWL reasoning with WebPIE: calculating the closer of 100 billion triples
OWL reasoning with WebPIE: calculating the closer of 100 billion triplesOWL reasoning with WebPIE: calculating the closer of 100 billion triples
OWL reasoning with WebPIE: calculating the closer of 100 billion triples
 
Bat algorithm explained. slides ppt pptx
Bat algorithm explained. slides ppt pptxBat algorithm explained. slides ppt pptx
Bat algorithm explained. slides ppt pptx
 
01 nosql and multi model database
01   nosql and multi model database01   nosql and multi model database
01 nosql and multi model database
 

Dernier

ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 

Dernier (20)

ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 

Improving ip geolocation using query logs

  • 1. Improving IP Geolocation using Query Logs Ovidiu Dan, Vaibhav Parikh and Brian D.Davison Lehigh University, Bethlehem, PA, USA Microsoft Bing, Redmond, WA, USA 1
  • 2. Outline 1- Introduction 2- Problem statement 3- Previous Work 4- Experiments 5- Results and conlusion 6- Criticism This template is free to use under Creative Commons Attribution license. If you use the graphic assets (photos, icons and typographies) provided with this presentation you must keep the Credits slide. 2
  • 3. Hello! I am Mahdi Atawneh You can find me at: @mshanak mahdi@ppu.edu 3
  • 5. ▷ IP Geolocation database: used to map IP address to their geographical location. Introduction Start IP End IP Country State City 1672213 1678654 Palestine Palestine Hebron 4123455 4321232 Jordan Jordan Amman 5
  • 6. 6
  • 7. IP Geolocation database used for: 1. Content delivery networks (which direct the user to the closets server). 2. Credit card fraud protection. 3. Advertisements. 4. Ecommerce . 5. location based licensing ( like youtube.com, Netflix ). Introduction 7
  • 9. Related Work Methods used previusly to generate IP geolocation database: 1. Network delay and toplogy. 2. web mining. 9
  • 10. Related Work 1- network delay and topology: relays on the observation of delay of network packets as they travel between two Internet hosts to the distance between the hosts. 10
  • 11. Related Work 1- network delay and topology limitations : • This method need access to hosts spread throughout the globe to perform measurement. • Not all networks support ICMP pings, • Errors could be ten of hundreds of kilometers. 11
  • 12. Related Work 2- web mining This method use information gathered from the web, it extracte locations mentioned in web pages and assigned them the IP of the server which host the content. 12
  • 13. Related Work 2- web mining limitations This method focuses on the location of the server not the end user. 13
  • 15. Contributions 1. Propose a technique to generate IP geolocation ground truth data using real time location information. 2. Study the impact of incorrect IP geolocation on user behavior. 3. Evaluate the accuracy of three IP geolocation databases. 4. Propose a preliminary method to improve IP geolocation db. 5. Run experiment to validate improvements to an IP geolocation database. 15
  • 16. Contributions 1. Propose a technique to generate IP geolocation ground truth data using real time location information. 2. Study the impact of incorrect IP geolocation on user behavior. 3. Evaluate the accuracy of three IP geolocation databases. 4. Propose a preliminary method to improve IP geolocation db. 5. Run experiment to validate improvements to an IP geolocation database. 16
  • 17. 1. Ground Truth • The problem of the previous work is the limited number of IP's in the ground truth. • In this paper , they used a log files of Bing (Microsoft) search engine which includes the real location of the users who use Bing through their mobile. 17
  • 18. 1. Ground Truth ▷Log file example 18
  • 19. 1. Ground Truth They performed many filtering steps on the log data: • Ensue that most IP addresses are from fixed broadband connections. • Each IP has single location. 19
  • 20. 1. Ground Truth Result: • a ground truth of 8.4 million IP address with the real-time location. • The set spans 220 countries. 20
  • 21. Contributions 1. Propose a technique to generate IP geolocation ground truth data using real time location information. 2. Study the impact of incorrect IP geolocation on user behavior. 3. Evaluate the accuracy of three IP geolocation databases. 4. Propose a preliminary method to improve IP geolocation db. 5. Run experiment to validate improvements to an IP geolocation database. 21
  • 22. 2. Impact of incorrect IP geolocation Study the behavior of the users of Bing search engine • for 7 days • across all devices To figure out the impact of incorrect location result. 22
  • 23. 2. Impact of incorrect IP geolocation Results: 23
  • 24. Contributions 1. Propose a technique to generate IP geolocation ground truth data using real time location information. 2. Study the impact of incorrect IP geolocation on user behavior. 3. Evaluate the accuracy of three IP geolocation databases. 4. Propose a preliminary method to improve IP geolocation db. 5. Run experiment to validate improvements to an IP geolocation database. 24
  • 25. 3. Evaluating IP geolocation Databases • The authors compared the top three commercial ip geolocation database (Vendor A, Vendor B, Vendor C). 25
  • 26. 3. Evaluating IP geolocation Databases Result • they found that none of the three Vendors achieved accuracy above 70% at the city level. • Vendor C outperform the other. 26
  • 27. Contributions 1. Propose a technique to generate IP geolocation ground truth data using real time location information. 2. Study the impact of incorrect IP geolocation on user behavior. 3. Evaluate the accuracy of three IP geolocation databases. 4. Propose a preliminary method to improve IP geolocation db. 5. Run experiment to validate improvements to an IP geolocation database. 27
  • 28. 4. Improve IP geolocation db • The authors want to improve existing database instead of creating new one. • They used the locations extracted from the user query. 28
  • 29. 4. Improve IP geolocation db 29
  • 30. 4. Improve IP geolocation db Datasets 1. Main query log ( it contains 180 days of Bing search engine query logs). 2. Validation query logs (30 days of Bing logs collected before main query logs), 3. Baseline: used the three vendors mentioned earlier. 4. ground truth: contain 8.4 million IP with there locations. 30
  • 31. 4. Improve IP geolocation db Approach: They propose improving IP geolocation databases by correcting the location of certain IP ranges using cities extracted from user queries. 31
  • 32. 4. Improve IP geolocation db Approach: steps 1. Extract queries 2. Filter impressions 3. Extract locations. 4. Reverse geocode locations ( Bing API ) 5. Aggregate locations. 6. Compute the popularity of each location. 32
  • 33. 4. Improve IP geolocation db Approach: steps 7. Score the location candidates in each IP range. 8. Decide whether to keep the original location or modify it based on queries. 9. Test the modified geolocation database against the ground truth 34
  • 34. 4. Improve IP geolocation db Results: 34
  • 35. Contributions 1. Propose a technique to generate IP geolocation ground truth data using real time location information. 2. Study the impact of incorrect IP geolocation on user behavior. 3. Evaluate the accuracy of three IP geolocation databases. 4. Propose a preliminary method to improve IP geolocation db. 5. Run experiment to validate improvements to an IP geolocation database. 35
  • 36. 5. Validate improvements They carried out experiment on Bing Search engine : • 7 days • 850,000 unique users • 1.6 query • targeted Mexico market 36
  • 38. Criticism • The authors discussed in details the real-time locations , but didn’t use it in their improvements. • Many repeated ideas with there discussion in the paper. 38