SlideShare a Scribd company logo
1 of 15
Download to read offline
Set Expansion
Meera Parmar (201305529) !
! meera.parmar@students.iiit.ac.in!
Nishith Maheshwari (201002016) !
! nishith.maheshwari@students.iiit.ac.in!
Vandan Mujadia(201323602) !
! vandan.mujadia@research.iiit.ac.in!
Venessa Tauro(201101032) !
! venessaroshni.tauro@students.iiit.ac.in
Set Expansion-What is it?
❖ Set expansion is a way to expand a set of given seed entities
automatically into a more complete set.!
! For example!
Input set: !
• {Sachin Tendulkar, Dhoni,Rahul dravid}!
expand set: !
• {amit bhandari, syed abid ali, parthiv patel, murali kartik,…}!
!
Tools used !
❖ Stanford POS(parts of speech tagger)!
• to eliminated non nominal entities from the parsed
list.!
❖ Stanford NER(Named entity recoginizer)!
• used in ranking to recognize proper name to put
entities in relevance order
Approach
(parsing and index creation)
A corpus based approach (wikipedia dataset)!
• parse ‘list of ’ pages to get entity list. !
• parse entity list based on ‘category’ given in wiki
page.!
• parse entity list from ‘Infobox , Taxobox , Geobox ’
etc..!
• parse entity list from wiki page contents.
parsing and indexing
Approach
(ranking categories and search)
!
Ranking of categories !
❖ ranked entity based on tf/idf score!
❖ ranked entity by word vector distance score !
Search !
❖ First search in ‘category list’ index!
❖ If there is no list found then search in ‘list of pages
list’ index
Searching and Ranking
Experiment
Input : !
! raajneeti anjaana anjaani my name is khan !
Output: !
jaane kahan se aayi hai !
antardwand!
pyaar impossible!
peepli live!
atithi tum kab jaoge!
mr singh mrs mehta!
khatta meetha!
anjaana anjaani!
thanks maa!
khelein hum jee jaan sey
Applications!
❖ Named entities recognition !
❖ In evaluation of question answering system!
❖ Text summarisation !
❖ Search result suggestion !
❖ etc..
Last words
❖ In this project we have devised a method for set
expansion on the Wikipedia data by applying a simple
yet effective approach. !
❖ This unsupervised method used to extent entity list
independent of the language. !
❖ For the validation, we tested the approach on multiple
domains and obtained acceptable results.(shown in
video)
References
❖ http://mlg.eng.cam.ac.uk/zoubin/papers/bsets-
nips05.pdf!
❖ https://www.cs.cmu.edu/afs/cs/Web/People/
wcohen/postscript/icdm-2007.pdf!
❖ http://www.dfki.de/~neumann/
InformationExtractionLecture2011/sessions/7-
SetExpansion.pdf
Project Links
❖ Project description : http://researchweb.iiit.ac.in/
~vandan.mujadia/!
❖ Project Demo : https://www.youtube.com/watch?
v=XZez5aMBNNc&feature=youtu.be!
❖ Project Presentation : http://www.slideshare.net/
VandanMujadia/set-expansioniiit-hireteam-no14!
❖ Project CodeBase : https://github.com/vmujadia/IIIT-
H-IRE14
Thank you

More Related Content

Viewers also liked

SPECIALITÀ E MONUMENTI D'ITALIA
SPECIALITÀ E MONUMENTI D'ITALIASPECIALITÀ E MONUMENTI D'ITALIA
SPECIALITÀ E MONUMENTI D'ITALIAelia osma
 
Pre production work jordan
Pre production work jordanPre production work jordan
Pre production work jordanecsmedia
 
Encontro co autor fernando lalana
Encontro co autor fernando lalanaEncontro co autor fernando lalana
Encontro co autor fernando lalanamigadepan
 
Faith at the Ferrell Power Point Sales Pitch
Faith at the Ferrell Power Point Sales PitchFaith at the Ferrell Power Point Sales Pitch
Faith at the Ferrell Power Point Sales PitchClaire_Perkins
 
Little cherry virus 2 comp 1
Little cherry virus 2 comp 1Little cherry virus 2 comp 1
Little cherry virus 2 comp 1treddout
 
Java EE 7 in practise - OTN Hyderabad 2014
Java EE 7 in practise - OTN Hyderabad 2014Java EE 7 in practise - OTN Hyderabad 2014
Java EE 7 in practise - OTN Hyderabad 2014Jagadish Prasath
 
홈페이지 개편 자료 수집
홈페이지 개편 자료 수집홈페이지 개편 자료 수집
홈페이지 개편 자료 수집Andrew Hwang
 
Valley Medical Center Style Guide
Valley Medical Center Style GuideValley Medical Center Style Guide
Valley Medical Center Style GuideShannonKrig
 
Presentation
PresentationPresentation
PresentationKevLoud
 
Media coursework
Media courseworkMedia coursework
Media courseworkecsmedia
 
Unidad 1 windows XP
Unidad 1 windows XPUnidad 1 windows XP
Unidad 1 windows XPedumoreno1
 
Visualizing Issues - UCCA Presentation
Visualizing Issues - UCCA PresentationVisualizing Issues - UCCA Presentation
Visualizing Issues - UCCA Presentationseaninchina
 
1000207鐵馬樂活在台東(3天2夜)(s)
1000207鐵馬樂活在台東(3天2夜)(s)1000207鐵馬樂活在台東(3天2夜)(s)
1000207鐵馬樂活在台東(3天2夜)(s)瑞明 許
 
Gods great plan!
Gods great plan!Gods great plan!
Gods great plan!shawker
 

Viewers also liked (19)

Left 4 Dead
Left 4 DeadLeft 4 Dead
Left 4 Dead
 
SPECIALITÀ E MONUMENTI D'ITALIA
SPECIALITÀ E MONUMENTI D'ITALIASPECIALITÀ E MONUMENTI D'ITALIA
SPECIALITÀ E MONUMENTI D'ITALIA
 
Lista
ListaLista
Lista
 
Pre production work jordan
Pre production work jordanPre production work jordan
Pre production work jordan
 
Encontro co autor fernando lalana
Encontro co autor fernando lalanaEncontro co autor fernando lalana
Encontro co autor fernando lalana
 
Faith at the Ferrell Power Point Sales Pitch
Faith at the Ferrell Power Point Sales PitchFaith at the Ferrell Power Point Sales Pitch
Faith at the Ferrell Power Point Sales Pitch
 
Little cherry virus 2 comp 1
Little cherry virus 2 comp 1Little cherry virus 2 comp 1
Little cherry virus 2 comp 1
 
Java EE 7 in practise - OTN Hyderabad 2014
Java EE 7 in practise - OTN Hyderabad 2014Java EE 7 in practise - OTN Hyderabad 2014
Java EE 7 in practise - OTN Hyderabad 2014
 
Merly
MerlyMerly
Merly
 
홈페이지 개편 자료 수집
홈페이지 개편 자료 수집홈페이지 개편 자료 수집
홈페이지 개편 자료 수집
 
Valley Medical Center Style Guide
Valley Medical Center Style GuideValley Medical Center Style Guide
Valley Medical Center Style Guide
 
Presentation
PresentationPresentation
Presentation
 
Media coursework
Media courseworkMedia coursework
Media coursework
 
Unidad 1 windows XP
Unidad 1 windows XPUnidad 1 windows XP
Unidad 1 windows XP
 
Visualizing Issues - UCCA Presentation
Visualizing Issues - UCCA PresentationVisualizing Issues - UCCA Presentation
Visualizing Issues - UCCA Presentation
 
1000207鐵馬樂活在台東(3天2夜)(s)
1000207鐵馬樂活在台東(3天2夜)(s)1000207鐵馬樂活在台東(3天2夜)(s)
1000207鐵馬樂活在台東(3天2夜)(s)
 
Application for HSE winter school
Application for HSE winter schoolApplication for HSE winter school
Application for HSE winter school
 
Gods great plan!
Gods great plan!Gods great plan!
Gods great plan!
 
Job specification
Job specificationJob specification
Job specification
 

Similar to Set expansion(iiit h[ire]team no-14)

SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...Alp Öktem
 
NAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive AnalyticsNAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive Analyticsshengjing 孙胜晶
 
Building successful research collaboration
Building successful research collaborationBuilding successful research collaboration
Building successful research collaborationQSR International
 
Using Data to Improve and Grow Work-Based Learning
Using Data to Improve and Grow Work-Based LearningUsing Data to Improve and Grow Work-Based Learning
Using Data to Improve and Grow Work-Based LearningNAFCareerAcads
 
Accessible health education: Setting it up from scratch
Accessible health education: Setting it up from scratchAccessible health education: Setting it up from scratch
Accessible health education: Setting it up from scratchTamara Shores
 
Impact the UX of Your Website with Contextual Inquiry
Impact the UX of Your Website with Contextual InquiryImpact the UX of Your Website with Contextual Inquiry
Impact the UX of Your Website with Contextual InquiryRachel Vacek
 
Strumenti di analisi per la valutazione di un gruppo di apprendimento online
Strumenti di analisi per la valutazione di un gruppo di apprendimento onlineStrumenti di analisi per la valutazione di un gruppo di apprendimento online
Strumenti di analisi per la valutazione di un gruppo di apprendimento onlineStefano Penge
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCLucidworks (Archived)
 
Innovation presentation
Innovation presentationInnovation presentation
Innovation presentationdionesioable
 
Introduction to blended learning
Introduction to blended learningIntroduction to blended learning
Introduction to blended learningSylvia Suh
 
Evaluating Tools in the Higher Ed Classroom: What Works
Evaluating Tools in the Higher Ed Classroom: What WorksEvaluating Tools in the Higher Ed Classroom: What Works
Evaluating Tools in the Higher Ed Classroom: What WorksKatherine Hepworth
 
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET Journal
 
Research Software Engineering at Stanford
Research Software Engineering at StanfordResearch Software Engineering at Stanford
Research Software Engineering at StanfordVanessa S
 
Ace the interview!_jill_2007
Ace the interview!_jill_2007Ace the interview!_jill_2007
Ace the interview!_jill_2007yfsud1
 

Similar to Set expansion(iiit h[ire]team no-14) (19)

SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
 
NAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive AnalyticsNAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive Analytics
 
Building successful research collaboration
Building successful research collaborationBuilding successful research collaboration
Building successful research collaboration
 
N01741100102
N01741100102N01741100102
N01741100102
 
Using Data to Improve and Grow Work-Based Learning
Using Data to Improve and Grow Work-Based LearningUsing Data to Improve and Grow Work-Based Learning
Using Data to Improve and Grow Work-Based Learning
 
Accessible health education: Setting it up from scratch
Accessible health education: Setting it up from scratchAccessible health education: Setting it up from scratch
Accessible health education: Setting it up from scratch
 
Impact the UX of Your Website with Contextual Inquiry
Impact the UX of Your Website with Contextual InquiryImpact the UX of Your Website with Contextual Inquiry
Impact the UX of Your Website with Contextual Inquiry
 
Strumenti di analisi per la valutazione di un gruppo di apprendimento online
Strumenti di analisi per la valutazione di un gruppo di apprendimento onlineStrumenti di analisi per la valutazione di un gruppo di apprendimento online
Strumenti di analisi per la valutazione di un gruppo di apprendimento online
 
Information research
Information researchInformation research
Information research
 
User Centered Design of an Android app
User Centered Design of an Android appUser Centered Design of an Android app
User Centered Design of an Android app
 
Ecer 2011
Ecer 2011Ecer 2011
Ecer 2011
 
Ecer 2011
Ecer 2011Ecer 2011
Ecer 2011
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Innovation presentation
Innovation presentationInnovation presentation
Innovation presentation
 
Introduction to blended learning
Introduction to blended learningIntroduction to blended learning
Introduction to blended learning
 
Evaluating Tools in the Higher Ed Classroom: What Works
Evaluating Tools in the Higher Ed Classroom: What WorksEvaluating Tools in the Higher Ed Classroom: What Works
Evaluating Tools in the Higher Ed Classroom: What Works
 
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
 
Research Software Engineering at Stanford
Research Software Engineering at StanfordResearch Software Engineering at Stanford
Research Software Engineering at Stanford
 
Ace the interview!_jill_2007
Ace the interview!_jill_2007Ace the interview!_jill_2007
Ace the interview!_jill_2007
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Set expansion(iiit h[ire]team no-14)

  • 1. Set Expansion Meera Parmar (201305529) ! ! meera.parmar@students.iiit.ac.in! Nishith Maheshwari (201002016) ! ! nishith.maheshwari@students.iiit.ac.in! Vandan Mujadia(201323602) ! ! vandan.mujadia@research.iiit.ac.in! Venessa Tauro(201101032) ! ! venessaroshni.tauro@students.iiit.ac.in
  • 2. Set Expansion-What is it? ❖ Set expansion is a way to expand a set of given seed entities automatically into a more complete set.! ! For example! Input set: ! • {Sachin Tendulkar, Dhoni,Rahul dravid}! expand set: ! • {amit bhandari, syed abid ali, parthiv patel, murali kartik,…}!
  • 3. ! Tools used ! ❖ Stanford POS(parts of speech tagger)! • to eliminated non nominal entities from the parsed list.! ❖ Stanford NER(Named entity recoginizer)! • used in ranking to recognize proper name to put entities in relevance order
  • 5. A corpus based approach (wikipedia dataset)! • parse ‘list of ’ pages to get entity list. ! • parse entity list based on ‘category’ given in wiki page.! • parse entity list from ‘Infobox , Taxobox , Geobox ’ etc..! • parse entity list from wiki page contents.
  • 8. ! Ranking of categories ! ❖ ranked entity based on tf/idf score! ❖ ranked entity by word vector distance score ! Search ! ❖ First search in ‘category list’ index! ❖ If there is no list found then search in ‘list of pages list’ index
  • 10. Experiment Input : ! ! raajneeti anjaana anjaani my name is khan ! Output: ! jaane kahan se aayi hai ! antardwand! pyaar impossible! peepli live! atithi tum kab jaoge! mr singh mrs mehta! khatta meetha! anjaana anjaani! thanks maa! khelein hum jee jaan sey
  • 11. Applications! ❖ Named entities recognition ! ❖ In evaluation of question answering system! ❖ Text summarisation ! ❖ Search result suggestion ! ❖ etc..
  • 12. Last words ❖ In this project we have devised a method for set expansion on the Wikipedia data by applying a simple yet effective approach. ! ❖ This unsupervised method used to extent entity list independent of the language. ! ❖ For the validation, we tested the approach on multiple domains and obtained acceptable results.(shown in video)
  • 14. Project Links ❖ Project description : http://researchweb.iiit.ac.in/ ~vandan.mujadia/! ❖ Project Demo : https://www.youtube.com/watch? v=XZez5aMBNNc&feature=youtu.be! ❖ Project Presentation : http://www.slideshare.net/ VandanMujadia/set-expansioniiit-hireteam-no14! ❖ Project CodeBase : https://github.com/vmujadia/IIIT- H-IRE14