2. Set Expansion-What is it?
❖ Set expansion is a way to expand a set of given seed entities
automatically into a more complete set.!
! For example!
Input set: !
• {Sachin Tendulkar, Dhoni,Rahul dravid}!
expand set: !
• {amit bhandari, syed abid ali, parthiv patel, murali kartik,…}!
3. !
Tools used !
❖ Stanford POS(parts of speech tagger)!
• to eliminated non nominal entities from the parsed
list.!
❖ Stanford NER(Named entity recoginizer)!
• used in ranking to recognize proper name to put
entities in relevance order
5. A corpus based approach (wikipedia dataset)!
• parse ‘list of ’ pages to get entity list. !
• parse entity list based on ‘category’ given in wiki
page.!
• parse entity list from ‘Infobox , Taxobox , Geobox ’
etc..!
• parse entity list from wiki page contents.
8. !
Ranking of categories !
❖ ranked entity based on tf/idf score!
❖ ranked entity by word vector distance score !
Search !
❖ First search in ‘category list’ index!
❖ If there is no list found then search in ‘list of pages
list’ index
10. Experiment
Input : !
! raajneeti anjaana anjaani my name is khan !
Output: !
jaane kahan se aayi hai !
antardwand!
pyaar impossible!
peepli live!
atithi tum kab jaoge!
mr singh mrs mehta!
khatta meetha!
anjaana anjaani!
thanks maa!
khelein hum jee jaan sey
11. Applications!
❖ Named entities recognition !
❖ In evaluation of question answering system!
❖ Text summarisation !
❖ Search result suggestion !
❖ etc..
12. Last words
❖ In this project we have devised a method for set
expansion on the Wikipedia data by applying a simple
yet effective approach. !
❖ This unsupervised method used to extent entity list
independent of the language. !
❖ For the validation, we tested the approach on multiple
domains and obtained acceptable results.(shown in
video)