This document provides an overview and instructions for an exercise on acquiring knowledge from social networking sites using Twitter and Facebook. It discusses downloading relevant tools and libraries and creating sample applications to extract information and analyze message content and structures from the social media platforms. The scenario involves building a corpus of messages from the two sites during the 2010 FIFA World Cup in South Africa and using it to understand popular topics of discussion.
1. EKAW 2010 • Tutorial T3
Friday • 15th october 2010
Knowledge Acquisition from Social Networking Sites
Z. Zhang, A.E. Cano, K. Elbedweihy, A.-S. Dadzie
6. a%$/',*T+"+$-)%"*',6"4*#26:+$* a%$/',*T+"+$-)%"*',6"4*#26:+$*
A'3>%8AC!%IJ@% A'3>%8AC!%IJ@%
•! 9",2+$
Twitter twitter = new try{
TwitterFactory().getInstance(); ResponseList<Status>publicTimeline = twitter.getPublicTimeline();
//*TODO Complete exercise and analyse structure and content of each status
try{ GeoLocation geoLocation;
//We request the public timeline, which returns a list of Status Place place;
ResponseList<Status> publicTimeline = twitter.getPublicTimeline(); while (it.hasNext()){
/** Status st = it.next();
* Complete this exercise and analyse the structure and content log.info(st.getText());
of each of the Status. log.info(st.getSource());
* Have a look at the java doc of the Status Class, or just if ((geoLocation = st.getGeoLocation()) != null)
check the available methods in your IDE log.info(geoLocation.toString());
*/ if ((place = st.getPlace()) != null) {
Iterator<Status> it = publicTimeline.iterator(); log.info(place.getFullName());
log.info(place.getBoundingBoxCoordinates().toString());
while (it.hasNext()){
}
//TODO check what are the info you can get from a Status.
}
}
} catch (TwitterException e){
•! !(1%#-%1.2($&/DS*+&6#*-"&*$'"*C-9-2$!&$-3Z969% }
e.printStackTrace();
a%$/',*T+"+$-)%"*',6"4*#26:+$* a%$/',*T+"+$-)%"*',6"4*#26:+$*
A'3>%8AC!%IJ@%
A'3N%C&9()"%IJ@%
•! E'#/'#**)7+36"+*,#-#',
??????????!!??888888888 RT @nico_news: ???????????????????????????????????????? http://bit.ly/aZcvfl
<a href="http://twipple.jp/" rel="nofollow">?????/twipple</a>
•! 933%2,*6"#+$-()%"*26#.*#26:+$*
Southampton v Tranmere: Preview followed by live coverage of Saturday's game between Southampton and Tranmere in L...
http://bit.ly/9N802N
$&9()"*-"&*-(&,*$*&-#-*
<a href="http://twitterfeed.com" rel="nofollow">twitterfeed</a>
Laper gueeee –! #%/*#%/6(,*#.-#*-$+*('$$+"#3G*#$+"&6"4*%"*
<a href="http://www.snaptu.com" rel="nofollow">Snaptu.com</a>
?????????????????????????? / ??????????????????????????
A26:+$*
•! !#*+>/%,+,*#.+*0%33%26"4*7+#.%&,S**
<a href="http://www.echofon.com/" rel="nofollow">Echofon</a>
Changing the Language of Oppression http://bit.ly/aXA4w3 #specialneeds
<a href="http://www.tweetdeck.com" rel="nofollow">TweetDeck</a>
Are you attending the SuperSwarm at Jewel, Piccadilly tonight? Let's get an idea of numbers via my poll @ www.theprgeek.co.uk –! ,+-$(.O**
#superswarmLDN
web –! #$+"&,O**
Simon Cowell To Receive Special Emmy Award: October 7, 2010: Music mogul and former American Idol judge Simo... http://
tinyurl.com/299o5gg –! #$+"&,?('$$+"#O*#$+"&,?&-63GO*#$+"&,?
<a href="http://twitterfeed.com" rel="nofollow">twitterfeed</a>
"Wajahmu seperti bulan" --» ini artinya ngatain kan yah? Org bulan bolong2 2++13G*
<a href="http://blackberry.com/twitter" rel="nofollow">Twitter for BlackBerry®</a>
FM????????????
<a href="http://stone.com/Twittelator" rel="nofollow">Twittelator</a>
•! A.+*F+-$(.*9U!*,'//%$#,*-7%"4*
???? [????:?????/????????????????????????]559 #colopl_msg
<a href="http://t.colopl.jp/t/" rel="nofollow">Colotwi</a>
%#.+$,O*#.+*0%33%26"4*%/+$-#%$,*0%$*
pikiran saya cabangnya banyak, jd pusing sendiri..penuh rasanya ni kepala (%",#$'()"4*-*5'+$G*,#$6"4*
<a href="http://m.tweete.net" rel="nofollow">m.tweete.net</a>...
7. a%$/',*T+"+$-)%"*',6"4*#26:+$* a%$/',*T+"+$-)%"*',6"4*#26:+$*
A'3[%C&9()"%IJ@% A'3N%C&9()"%IJ@%
–! P9#-h*i%'*2633*"++&*#%*(%7/3+#+*#.+*(%&+*0%$*6#*#%*-(#'-33G*&%*
$#,)&#*T% F/+(6D+,*#.+*6&*%0*#.+*,#-#',*0$%7*2.6(.*#%*,#-$#*#.+*,+-$(.* ,%7+#.6"4h*<*R&6#*#.+*(3-,,S*
2,:/#*T% F/+(6D+,*#.+*6&*%0*#.+*,#-#',*0$%7*2.6(.*#%*+"&*#.+*,+-$(.*
ekaw.kasna.twitter.QueryTest R>+$(6,+*
C#,)&T% F#-#',+,*/$%&'(+&*,6"(+*-*,/+(6D+&*&-#+*H+;4;*^_[_<_l<[_K*
B,:/T% Query query = new Query();
V/-&(T/#,F$% C+#$6+8+,*#2++#,*26#.%'#*36"1,* query.query("football");
D(.MT% C+#$6+8+,*,#-#',+,*0$%7*-*468+"*',+$;*H+;4;*0$%7S*D0-K* //*TODO Modify the query object, and search for
/9,+T% C+#$6+8+,*,#-#',+,*6"*-*468+"*3-"4'-4+* today's tweets (in english) related to football
W8% +;4;O*7+")%"6"4*g+>6(%*EC*W$-"(+* //*TODO Restrict your results to tweets generated
within 300 kilometers of Johannesburg, South Africa
T%Y% +;4;O*(%"#-6"6"4*0%%#@-33*26#.*-*/%,6)8+*-m#'&+*H+;4;*0%%#@-33*SK*K*
// hint: use Query's geoCode method, the
K&+9:.,% +;4;O*7+")%"6"4*@++$*@'#*"%#*$%%#* Kilometers unit is given as Query.KILOMETERS
// hint: South Africa's lat: 26.12, long: 28.2
C.2()&T% +;4;O*a%"#-6"6"4*0%%#@-33*+"#+$+&*86-*A26:+$W++&*H+;4;*"+2,*
,%'$(+SA26:+$W++&K* •! !(1%#-%1.2($&/DS*$'"*]2&(1!&$-3Z969%
a%$/',*T+"+$-)%"*',6"4*#26:+$* a%$/',*T+"+$-)%"*',6"4*#26:+$*
A'3>%8AC!%IJ@%
A'3N%C&9()"%IJ@%
•! E'#/'#**5'+$G*$+5'+,#*0%$*L0%%#@-33M*"+-$*L]%.-""+,@'$4M
–! I,$G&(%
hits:15
Query query = new Query(); MQMhlanzi:Total Football 360: Bafana Eager to Keep the Momentum of Winning! http://t.co/xOPTaY9
Benleeds:RT @BumbleCricket: any big shot yank out there SO intersted in football that he would like to buy Accrington or
query.query("football"); Morecambe or Dagenham and Redbridge?
Tumelo13:Gota admit I miss my NONstop #football convo's wit @Denisao_4 and @GordonTyler8! Haha talk bout nothing but the
#beautifulgame
//*TODO Modify the query object, and search for Tumelo13:RT @Denisao_4: Ey bra @Tumelo13 that's not a sin! That's for the love of football! I approve wow! Let's hope it works :)??
today's tweets related to football Amen
Edwardo84:RT @BumbleCricket: Liverpool FC ...what a mess ...greed rears its head again ...football and fans suffer
jonerz97:RT @BumbleCricket: any big shot yank out there SO intersted in football that he would like to buy Accrington or Morecambe
//*TODO Restrict your results to tweets generated or Dagenham and Redbridge?
within 300 kilometers of Johannesburg, South Africa dcocker11:RT @BumbleCricket: Liverpool FC ...what a mess ...greed rears its head again ...football and fans suffer
AntimoOsato91:@siasduplessis Oros and The Dutch National Football Team could be good sponsors too! Haha :)
IsaacTeka:#football - EURO 2012 qualifier between Germany and Turkey is gonna be a fierce encounter. #Ozil and #Khedira
// hint: use Query's geoCode method, the applenessuk:RT @BumbleCricket: Liverpool FC ...what a mess ...greed rears its head again ...football and fans suffer
johnyrotten:RT @BumbleCricket: any big shot yank out there SO intersted in football that he would like to buy Accrington or
Kilometers unit is given as Query.KILOMETERS Morecambe or Dagenham and Redbridge?
// hint: Johannesburg’s lat: 26.12, long: 28.2 kartikverma:RT @BumbleCricket: Liverpool FC ...what a mess ...greed rears its head again ...football and fans suffer
query.geoCode(new GeoLocation(26.12,28.2), RawRemedy:RT @BumbleCricket: any big shot yank out there SO intersted in football that he would like to buy Accrington or
Morecambe or Dagenham and Redbridge?
30,Query.KILOMETERS); TLW1Dan:RT @BumbleCricket: Liverpool FC ...what a mess ...greed rears its head again ...football and fans suffer
jopayne:RT @BumbleCricket: any big shot yank out there SO intersted in football that he would like to buy Accrington or Morecambe
or Dagenham and Redbridge?
8. a%$/',*T+"+$-)%"*',6"4*#26:+$* a%$/',*T+"+$-)%"*',6"4*#26:+$*
A'3[%C-(&9M%IJ@% A'3[%C-(&9M%IJ@%
Twitter 4j allows you to retrieve streaming samples using the class
RestAPI and SearchAPI only present a limited snapshot of
TwitterStream. For the public timeline you just need basic
a timeline. During the finals of the 2010 World Cup authentication.
the rate of tweets containing the tags
#Spain, #Netherlands, #Germany, [*** Create a TwitterStream instance
#Uruguay, was quite high. twitterStream = new
TwitterStreamFactory(this).getInstance("yourAcc","yourPass");
Two options: Set a Listener for receiving the event of a status. Your listener should
^*
•! make requests, say, every 2sec implement the method public void onStatus(Status status)
through the RestAPI or the Search API,
•! BETTER: twitterStream.setStatusListener(this);
•! start listening to a stream of public
l*** Start Sampling
tweets &
twitterStream.sample();
•! filter according to the tag patterns
Y* Do something with the tweet in your onStatus method
a%$/',*T+"+$-)%"*',6"4*#26:+$* a%$/',*T+"+$-)%"*',6"4*#26:+$*
A'3[%C-(&9M%IJ@% A'3[%C-(&9M%IJ@%
–! P9#-h*i%'*2633*"++&*#%*(%7/3+#+*#.+*(%&+*0%$*6#*#%*-(#'-33G*&%* –! I,$G&(
,%7+#.6"4h*<*R&6#*#.+*(3-,,S*
ekaw.kasna.twitter.StreamTest private void startConsuming() throws TwitterException {
twitterStream.setStatusListener(this);
private void startConsuming() throws TwitterException { //*TODO Using TwitterStream’s filter method,
twitterStream.setStatusListener(this); restrict your sampling to collect tweets that include
the words: football, worldcup, final
//*TODO Using TwitterStream’s filter method,
restrict your sampling to collect tweets that include String[] filterWords = {"#worldcup", "#WorldCup",
the words: football, worldcup, final "#Worldcup", "#WORLDCUP"};
twitterStream.setStatusListener(this);
twitterStream.sample(); twitterStream.filter(0,null,filterWords);
} twitterStream.sample();
}
•! !(1%#-%1.2($&/DS*$'"*C-(&9M!&$-3Z969%
9. a%$/',*T+"+$-)%"*',6"4*#26:+$* a%$/',*T+"+$-)%"*',6"4*#26:+$*
I**#:.,9/%A'&()#$&T%I2-"&,:)9:.,% •! Try it yourself!
•! Authenticating using Oauth
•!*$+,#$6()%",*#%*-((+,,6"4*/$68-#+*&-#-hhh* •! OAuthTest.java
•! Using the application “Ekaw-Kasna”
•!%Q^IKRAC%CAJ%NO>O** •! Login with your twitter account and go to:
•!*(.-"4+*#%*-'#.+")(-)%"*7%&+*0%$*$+#$6+86"4*6"&686&'-3,M* http://twitter.com/apps/new
,#-#',*6"0%$7-)%"*
•!0$%7*-*,67/3+*',+$"-7+</-,,2%$&*#%S*
•! W92-"7E9$&*%92-"&,:)9:.,*%0*$+46,#+$+&*c-//36(-)%",d*
a%$/',*T+"+$-)%"*',6"4*#26:+$* a%$/',*T+"+$-)%"*',6"4*#26:+$*
•! I2-"&,:)9:,+%2$#,+%W92-"%
–! C'""6"4*#.+*+>-7/3+*$+5'6$+,*-*U!Z*
•! +"#+$*#.+*eCV*-#*#.+*(%",%3+*6"*-*2+@*@$%2,+$*
•! #%*%@#-6"*-"*%-'#.=#%1+"*
i%'*2633*"++&*#.+,+*#2%*
,#$6"4,*0%$*-'#.+")(-)"4**
i%'*2633*@+*4686"4*
-'#.%$6j-)%"*#%*#.6,*
-//36(-)%"*#%*-((+,,*
G%'$*6"0%$7-)%"*
11. 0-(+@%%1*9U!**W+#(.6"4*E@N+(#,* 0-(+@%%1*9U!**W+#(.6"4*e,+$*&-#-*
•! The Graph API https://graph.facebook.com/facebook
•! provides facilities for reading and writing data to
facebook
•! Each API request starts with the URL:
https://graph.facebook.com
•! e.g., data about any object can be found by fetching
https://graph.facebook.com/objectID
- objectID is the unique id of this object in the social
graph
- e.g., the unique id for a page is its name:
https://graph.facebook.com/facebook
0-(+@%%1*9U!**a%""+()%",* 0-(+@%%1*9U!**a%""+()%",*
•! All objects in the facebook social graph are connected via
relationships (connections)
•! Fetch connections
https://graph.facebook.com/objectID/connection_type
•! e.g., the page’s own posts
https://graph.facebook.com/facebook/posts
12. 0-(+@%%1*9U!**U-4+*a%""+()%",* 0-(+@%%1*9U!**W63#+$6"4*Q-#-*
D&&*% A.+*/-4+M,*2-33* •! Data can be filtered using parameters
0#)-2(&% A.+*/-4+M,*/$%D3+*/6(#'$+* •! e.g.,
-9++&*% A.+*/.%#%,O*86&+%,O*-"&*/%,#,*6"*2.6(.*#.6,*/-4+*.-,*@++"*#-44+&* -! since, until ---> specify date ranges
/#,F$% A.+*/-4+o,*/%,#+&*36"1,*
-! limit ---> specify amount of returned data
0".-.$% A.+*/.%#%,*#.6,*/-4+*.-,*'/3%-&+&*
+(.20$% A.+*4$%'/,*#.6,*/-4+*6,*-*7+7@+$*%0*
9/E2M$_6#*&.$% A.+*/.%#%*-3@'7,?86&+%,**#.6,*/-4+*.-,*($+-#+&*
•! e.g., fetching the feed
$-9-2$&$% A.+*/-4+o,*,#-#',*'/&-#+,*
-! within specified dates and
,.-&$% A.+*/-4+o,*"%#+,* -! with a limit of 50
0.$-$% A.+*/-4+o,*%2"*/%,#,*
https://graph.facebook.com/worldcup/feed?
since=2010-07-17&until=2010-07-20&limit=50
M&ME&($% A.+*/-4+o,*7+7@+$,;*i%'*(-"*%"3G*5'+$G*'/*#%*J__*7+7@+$,;*!#*6,*"%#*
/%,,6@3+*#%*6#+$-#+*#.$%'4.*#.+*36,#;*R>-7/3+S*.:/,S??4$-/.;0-(+@%%1;(%7?
pU9TR=!Qq?7+7@+$,k3676#rJ__*
&6&,-$% A.+*+8+"#,*#.6,*/-4+*6,*-:+"&6"4*
)"&)F#,$% a.+(16",*7-&+*@G*0$6+"&,*%0*#.+*('$$+"#*,+,,6%"*',+$*
0-(+@%%1*9U!**W63#+$6"4*Q-#-* 0-(+@%%1*9U!**W6"&6"4*E@N+(#,**
•! Search for objects
https://graph.facebook.com/search?
q=query&type=objectType
c($+-#+&=)7+d*6,*26#.6"* - query ---> what you want to find
#.+*,/+(6D+&*&-#+*$-"4+,*
- objectType ---> type of the object (e.g.
facebook post, user)
•! e.g., search all public posts for “2010 world cup”
https://graph.facebook.com/search?q=2010%20world
%20cup&type=post
13. 0-(+@%%1*9U!**W6"&6"4*E@N+(#,** 0-(+@%%1*9U!**T$-/.*9U!*R>+$(6,+*
Try it yourself!
•! Fetch the data about the page worldcup
•! Get the feed of this page (hint: connection is feed)
•! this is the wall for the page worldcup
•! Return only the first 5 messages of this feed
U%,#,*(%"#-6"6"4*#.+*#+$7,**
c^_[_d*B*c2%$3&d*B*c('/d*
•! Search for all pages containing worldcup in the
page name
0-(+@%%1*9U!**T$-/.*9U!*R>+$(6,+* 0-(+@%%1*9U!**T$-/.*9U!*R>+$(6,+*
•! ANSWERS •! ANSWERS
•! page worldcup: •! Get the feed (wall) of the page worldcup:
https://graph.facebook.com/worldcup/feed
•! fetch https://graph.facebook.com/worldcup
14. 0-(+@%%1*9U!**T$-/.*9U!*R>+$(6,+* 0-(+@%%1*9U!**T$-/.*9U!*R>+$(6,+*
•! ANSWERS •! ANSWERS
•! Return only the first 5 messages of the feed: •! Search for all pages containing worldcupin the
https://graph.facebook.com/worldcup/feed&limit=5
page name
https://graph.facebook.com/search?q=worldcup&type=page
a36+"#*V6@$-$6+,* C+,#WX*9U!**`%$3&*a'/*F(+"-$6%**
•! Multiple client libraries for facebook API •! Exercise:
http://developers.facebook.com/search? get the messages sent on the day of the
q=User:Client_Libraries
England-Germany match - 27th of June 2010
•! RestFB client library was the first java library to support
[*** Search for all pages containing “worldcup”
the GraphAPI
•! Other Java libraries now supporting GraphAPI
^* For every page:
- BatchFB
•! Get the messages posted on that day
- TinyFBGraphClient
•! Store the messages to generate your corpus
- facebook Java Webapp
•!We use the RestFB client library in this tutorial
16. C+,#WX*9U!**R>+$(6,+* C+,#WX*9U!**R>+$(6,+*
Try it yourself! ANSWERS
Connection<Group> groupSearch =
•! Edit the class SearchTest.java facebookClient.fetchConnection(
"search", Group.class,
Parameter.with("q", "2010 world cup"),
•! Search for all groups talking about a topic of Parameter.with("type", "group"),
Parameter.with("limit", "15"));
interest to you
•! Get the first 15 groups for (Group group : groupSearch.getData()) {
System.out.println("Name: " + group.getName());
•! For every group: System.out.println("ID: " + group.getId());
}
- print name and ID
C+,#WX*9U!**$+#'$"*0$%7*$+5'+,#*<*4$%'/,* C+,#WX*9U!**T+m"4*#.+*0++&*
‘2010 world cup’ groups
K9M&% @<%
•! Step 2:
kkkkkkk**x-7-3+1*Ey(6-3*T$%'/* ^^JJ^[YItu[J*
^_[_*W!W9*`ECVQ*aeU* [^Y[Iulu_uJ[YJv*
Connection<T>
fetchConnection(String connection,
^_[_*W!W9*`%$3&*a'/* ^^_YtlvIYJ*
Class<T> connectionType,
^_[_*W!W9*`ECVQ*aeU*FEeAn*9WC!a9* ^I_Ilt[tYJI*
Parameter... parameters)
^_[_*W60-*`%$3&*a'/*F%'#.*90$61-* [^_uIl^[[^II[Ju*
^_[_*W!W9*`%$3&*a'/*F%'#.*90$6(-* [[[I_tJvJJ[YYlv*
^_[_*W60-*`%$3&*a'/*Q$6"16"4*T-7+* ^lv[^t[ut_^u* Connection<Post> myFeed = facebookClient.fetchConnection(
^_[_*W!W9*`ECVQ*aeU*FEeAn*9WC!a9* [_tJ^t^u^J[Jlt_* "worldcup/feed", Post.class, Parameter.with("since",
g'"&6-3*^_[_*F'&-0$6(-*^_[_*`%$3&*('/* [uuv^tvtIlvl* "2010-06-27T11:00:00"), Parameter.with("until",
"2010-06-28T17:00:00"), Parameter.with("limit", "10"));
!#-36-*<*^_[_*W!W9*`%$3&*a'/* [tJYlYIlt^^*
^_[_<W!W9<`%$3&<a'/* [^vlIIll_I[^uIl*
^_[_*`%$3&*a'/** [[^_uJ^JttlJYYu*
^_[_*`%$3&*a'/* [ulll^l[vlIl* .:/,S??4$-/.;0-(+@%%1;(%7?2%$3&('/?0++&k
^_[_*W!W9*`%$3&*a'/* [l_YvttuvuvJYII* ,6"(+r^_[_<_v<^IP'")3r^_[_<_v<^tP3676#r^_*
^_[_*W!W9*`%$3&*a'/* [vl[Y_tt[uIt*
17. CRFA*9U!**T+m"4*#.+*0++&* CRFA*9U!**$+#'$"*0$%7*$+5'+,#*<*0++&*
Try it yourself! - ConnectionsTest.java
•! 0++&*$+#'$",*-33*/%,#,*2$6:+"*%"*#.+*,/+(6D+&*&-#+*
•! Message: the english were hoping to play penalties what a waste of their
•! W%$*+-(.*/%,#*-:$6@'#+,*$+#'$"+&*6"(3'&+S* training time
–! ($+-)%"*)7+O*/%,#*"-7+O*&+,($6/)%"b;* Creation Time: Sun Jun 27 17:45:13 BST 2010
•! Message: Deutschland, Deutschland über alles, über alles in der Welt
Creation Time: Sun Jun 27 17:29:25 BST 2010
•! Message: world cup?? this wasn't a 'football games' but 'fakeball' games!!
for (Post post : myFeed.getData()) { Lampard was scored but the referee was blind....4-1?? congrats to the
referees coz they have a massive party tonite to celebrate!! $$$$$$$$$$$$$
System.out.println("Message: " + post.getMessage()); $$$ wow.... even can makes people blind!!! world cup??? **** off!!!
System.out.println("tCreation Time" + Creation Time: Sun Jun 27 17:25:32 BST 2010
post.getCreatedTime());
•! Message: how are we suppose to be patriotic with a team that plays like
}* that, none of them deserve the money they get, waste of time..............
Creation Time: Sun Jun 27 16:48:06 BST 2010
•! Message: john terry on england should get worst defender for the year...he's
no good
Creation Time: Sun Jun 27 16:42:39 BST 2010
CRFA*9U!**U%,#*U$%/+$)+,O*a%""+()%",* a%$/',*T+"+$-)%"*',6"4*0-(+@%%1*
Properties I**#:.,9/%A'&()#$&T%I2-"&,:)9:.,%
#*% A.+*/%,#*!Q* •!*$+,#$6()%",*#%*-((+,,6"4*/$68-#+*&-#-hhh*
D(.M% 9"*%@N+(#*(%"#-6"6"4*#.+*!Q*-"&*"-7+*%0*#.+*',+$*2.%*/%,#+&*#.+*7+,,-4+*
•!*9((+,,*A%1+"*$+5'6$+&*0%$*,%7+*7+#.%&,*
-.% 9*36,#*%0*#.+*/$%D3+,*7+")%"+&*%$*#-$4+#+&*6"*#.6,*/%,#*
M&$$9+&% A.+*7+,,-4+* •!#%*/$+8+"#*-((+,,*H$+-&*%$*2$6#+K*#%*/$68-#+*&-#-*
0#)-2(&% !0*-8-63-@3+O*-*36"1*#%*#.+*/6(#'$+*6"(3'&+&*26#.*#.6,*/%,#* •!+;4;O*/'@36,.6"4*#%*#.+*0-(+@%%1*,%(6-3*4$-/.*
/#,F% A.+*36"1*-:-(.+&*#%*#.6,*/%,#* •!*X6&&6"4#%"*/$%86&+,*-*4%%&*+>/3-"-)%"*0%$*4+m"4*-((+,,*#%1+",*-#S*
,9M&% A.+*"-7+*%0*#.+*36"1* http://benbiddington.wordpress.com/2010/04/23/facebook-graph-
)90:.,_*&$)(#0:.,% A.+*(-/)%"?&+,($6/)%"**%0*#.+*36"1*H-//+-$,*@+"+-#.*#.+*36"1*"-7+K* api-getting-access-tokens
$.2()&% !0*-8-63-@3+O*#.+*,%'$(+*36"1*-:-(.+&*#%*#.6,*/%,#*H0%$*+;4;O*-*z-,.*%$*86&+%*D3+K*
#).,% 9*36"1*#%*-"*6(%"*$+/$+,+")"4*#.+*#G/+*%0*#.6,*/%,#*
9H(#E2:.,% 9*,#$6"4*6"&6(-)"4*2.6(.*-//36(-)%"*2-,*',+&*#%*($+-#+*#.6,*/%,#*
•!*+;4;O*0+#(.*#.+*0$6+"&,*%0*',+$*L1.-&6N-;+3@+&2+6.GM*
9):.,$% 9*36,#*%0*-8-63-@3+*-()%"*"-7+,*-"&*36"1,*H6"(3'&6"4*(%77+")"4O*3616"4*-"&*-"* •!*#.6,*$+5'6$+,*-'#.+")(-)%"**#%1+"*L>>`a`bO``O;;;M*
%/)%"-3*-//<,/+(6D+&*-()%"K* https://graph.facebook.com/khadija.elbedweihy/
/#F&$% A.+*"'7@+$*%0*361+,*%"*#.6,*/%,#*
friends&access_token=11585905509...
)(&9-&*:M&% A.+*)7+*#.+*/%,#*2-,*6"6)-33G*/'@36,.+&*
20*9-&*:M&% A.+*)7+*%0*#.+*3-,#*(%77+"#*%"*#.6,*/%,#* 933*/$%/+$)+,*P*
(%""+()%",*%0*-* •!%!(1%#-%1.2($&/D3;;;*
Connections
cU%,#d*
).MM&,-$% 933*%0*#.+*(%77+"#,*%"*#.6,*/%,#*
21. a%"#+"#*9"-3G,6,**Z-#'$-3*V-"4'-4+*9"-3G,6,* a%"#+"#*9"-3G,6,**Z-#'$-3*V-"4'-4+*9"-3G,6,*
•! C&,-&,)&%$&+M&,-9:.,% •! C&,-&,)&%$&+M&,-9:.,%2$#,+%W0&,KLJ*
–! @,02-S*-*,6"43+*7+,,-4+*
/* Input */ (LINE 17)
–! W2-02-S*-*36,#*%0*,+"#+"(+,* String pathToInput = "../../data/examples/example1.txt";
String content = "…";
Rooney fails to end goal drought. | Wayne Rooney's trip to /* Creates an object of OpenNLP sentence segmentation detector */
South Africa 2010 began with high expectations but he SentenceDetector detector = new SentenceDetector("lib/opennlp/models/
EnglishSD.bin.gz");
leaves without a single goal scored after three group
matches and a 1-4 defeat to Germany. /* Call the actual method to identify the end offsets of sentences. */
int[] result = detector.sentPosDetect(content);
/* Print out the sentences */ Rooney fails to end goal drought. Wayne Rooney's
Try it yourself! <*F+"#+"(+F+47+"#-)%";N-8-** int start=0, i=0; trip to South Africa 2010 began with high
expectations but he leaves without a single goal
do { scored after three group matches and a 1-4 defeat
…… to Germany.
} while(start<result[result.length-1]);
a%"#+"#*9"-3G,6,**Z-#'$-3*V-"4'-4+*9"-3G,6,* a%"#+"#*9"-3G,6,**Z-#'$-3*V-"4'-4+*9"-3G,6,*
•! !.F&,#$9:.,% •! !.F&,#$9:.,%2$#,+%W0&,KLJ
–! !"/'#S*-*,6"43+*,+"#+"(+O*%$*7+,,-4+* /* Input text message */ (LINE 28)
String content = "…" // read in the text content from "example1.txt"
–! E'#/'#S*-*36,#*%0*#%1+",*
List<String> sentences = new ArrayList<String>();
……
/* Code for splitting sentences */
Rooney fails to end goal drought /*Creates an object of OpenNLPtokeniser using a pre-built English language
model. */
//change the path accordingly
String pathToEngTokenisationModel = "lib/opennlp/models/EnglishTok.bin.gz";
Rooney, fails, to, end, goal, drought, . Tokenizertokeniser tokeniser = new Tokenizer(pathToEngTokenisationModel);
/*Tokenise each sentence and print out the result*/
Try it yourself! <*A%1+"6,-)%";N-8-** for(String sentence: sentences){
String[] result=tokeniser.tokenize(sentence);
for(String tok:result)
System.out.println(tok); Rooney fails to end goal drought.
}
22. a%"#+"#*9"-3G,6,**Z-#'$-3*V-"4'-4+*9"-3G,6,* a%"#+"#*9"-3G,6,**Z-#'$-3*V-"4'-4+*9"-3G,6,*
•! J9(-%.D%$0&&)"%-9++#,+% •! JWC%-9++#,+%2$#,+%W0&,KLJ*
/*Input text message*/ (LINE 31)
–! @,02-S*-*36,#*%0*#%1+",* String content = "…" //read in the text content from example1.txt
–! W2-02-S*-*36,#*%0*#%1+",*26#.*#.+6$*/-$#*%0*,/++(.*#-4* List<String> tokens = new ArrayList<String>();
/* Code for tokenisation and add the result into the list object above.
You do not need to do sentence segmentation in this case. Because the
tokenisation will detect sentence boundary as a first step*/
Rooney, fails, to, end, goal, drought, .
/*Creates an object of OpenNLP POS tagger using a pre-built English
language model.*/
//change the path accordingly
Rooney/NNP fails/VBZ to/TO end/VB goal/NN drought/ String pathToEngPOSModel = "lib/opennlp/models/tag.bin.gz";
/* You MAY specify additionally two parameters for the constructor, i.e.,
NN ./. TagDicionary and Dictionary.*/
PosTagger tagger = new PosTagger(pathToEngPOSModel, (Dictionary)null);
Try it yourself! <*UEFA-44+$;N-8-** /*Tag the list of tokens and print out the result*/
String[] result=tagger.tag(tokens.toArray(new String[0])); goal/NN
Rooney/NNP fails/VBZ to/TO end/VB
drought/NN ./.
for (String tag: result)
System.out.println(tag);
a%"#+"#*9"-3G,6,**U.$-,+*a.'"16"4* a%"#+"#*9"-3G,6,**Z-#'$-3*V-"4'-4+*9"-3G,6,*
•! R.9/S*6&+")0G6"4*6"0%$7-)%"*'"6#,*#.-#*7-1+*4%%&* •! J"(9$&%)"2,F#,+%
(-"&6&-#+*#+$7,*%0*%'$*6"#+$+,#* –! @,02-S*-*36,#*%0*JWC7-9++&*%-.F&,$%
•! !"*#.6,*+>+$(6,+O*2+*0%(',*%"*,.2,%0"(9$&$% –! W2-02-S*-*36,#*%0*/.$-,+,*H"%'",?8+$@*/.$-,+,K*
–! 2.6(.*%|+"*@+-$*67/%$#-"#*&%7-6"<,/+(6D(*
6"0%$7-)%"* Rooney/NNP fails/VBZ to/TO end/VB goal/NN drought/
NN ./.
•! @,02-*
–! UEF<#-44+&*#%1+",*
•! W2-02-* Rooney, goal drought
–! Z%'"*/.$-,+,*
R>+$(6,+*
Try it yourself!
%*+&6#*#.+*(3-,,*U.$-,+a.'"1+$;N-8-*-"&*$'"*
23. a%"#+"#*9"-3G,6,**Z-#'$-3*V-"4'-4+*9"-3G,6,* a%"#+"#*9"-3G,6,**Z-#'$-3*V-"4'-4+*9"-3G,6,*
•! J"(9$&%)"2,F#,+%2$#,+%W0&,KLJ*
•! J"(9$&%)"2,F#,+%2$#,+%W0&,KLJ* (LINE 44 in PhraseChunker.java)
int[] result = detector.sentPosDetect(content);
int start = 0, i = 0;
(LINE 32 in PhraseChunker.java)
do {
//initilising all required NLP processors, If you get an out of memory
//sentence splitting
//exception, try increasing your JVM heap space to at least 256MB
String sentence = content.substring(start, result[i]);
String pathToEngTokenisationModel = "lib/opennlp/models/EnglishTok.bin.gz";
//TODO: tokenization, put tokens in a String array. Hint:
String pathToEngPOSModel = "lib/opennlp/models/tag.bin.gz";
//Tokenisation.java
String pathToEngPhraseModel = "lib/opennlp/models/EnglishChunk.bin.gz";
String[] tokens = null;
//TODO: POS tagging, put tags in a String array. Hint: POSTagger.java
SentenceDetector detector = new SentenceDetector("lib/opennlp/models/
EnglishSD.bin.gz"); String[] tags = null;
Tokenizertokeniser = new Tokenizer(pathToEngTokenisationModel); //This is the method you use to chunk phrases on a list of tokens and
PosTagger tagger = new PosTagger(pathToEngPOSModel, (Dictionary) null); //a list of tags
String[] phrases = chunker.chunk(tokens, tags);
TreebankChunkerchunker = new TreebankChunker(pathToEngPhraseModel); //See the result
for(String p:phrases)
System.out.println(p);
……
start = result[i];
i++;
} while (start < result[result.length - 1]);
a%"#+"#*9"-3G,6,**Z-#'$-3*V-"4'-4+*9"-3G,6,* a%"#+"#*9"-3G,6,**Z-#'$-3*V-"4'-4+*9"-3G,6,*
•! J"(9$&%)"2,F#,+%2$#,+%W0&,KLJ* •! J"(9$&%)"2,F#,+%2$#,+%W0&,KLJ*
(LINE 44 in PhraseChunker.java) (LINE 78 in PhraseChunker.java)
int[] result = detector.sentPosDetect(content); String npstart = "B-NP"; a%&+*0$%7*36"+*It*%"2-$&,*/$%(+,,+,*
int start = 0, i = 0; String vpstart = "B-VP";
#.6,*$+,'3#*-"&*4+"+$-#+,*#.+*$+-3*
do { A.+*$+,'3#*6,*"%#*+>-(#3G*#.+*/.$-,+,*2+* String npcontinue = "I-NP";
/.$-,+,*
//sentence splitting +>/+(#+&O*@'#*-*36,#*%0*c#-4,dO*2.6(.*-$+* String vpcontinue = "I-VP";
(%77%"3G*',+&*6"*ZVU*/.$-,+*
String sentence = content.substring(start, result[i]); String other = "O";
//TODO: tokenization, put tokens in (.'"16"4S* array.
a String String phrase = "";
String[] tokens=null; for (int n = 0; n < tokens.length; n++) {
X<ZU*****C%%"+G * *C%%"+G*
//TODO: POStagging, put tags in a String array. Hint: POSTagger.java if (phrases[n].equals(npstart) || phrases[n].equals(vpstart)) {
String[]–tags = null;
B “begin”
X<}U******0-63,* phrase = tokens[n];
//ThisI is“inside”
– the method you use to chunk phrases on a list of tokens and for (int m = n + 1; m < tokens.length; m++) {
!<}U*******#% * * *0-63,*#%*+"&*
//a list – “Noun phrase”
NP of tags if (phrases[m].equals(npcontinue) ||
!<}U*******+"&*
String[] phrases phrase”
VP – “Verb = chunker.chunk(tokens, tags);
X<ZU*****4%-3* phrases[m].equals(vpcontinue)) {
//See the result
!<ZU******&$%'4.# *
for (int k = 0; k < phrases.length; k++) {
*4%-3*&$%'4.#* phrase = phrase+" "+tokens[m];
} else {
System.out.println(phrases[k] + "tt" + tokens[k]);
System.out.println("Actual phrase: "+phrase);
}
phrase = "";
……
break;
start = result[i];
...
i++;
}
} while (start < result[result.length - 1]);
24. a%"#+"#*9"-3G,6,**Z-#'$-3*V-"4'-4+*9"-3G,6,* g%$+*+>+$(6,+,*60*G%'*-$+*6"#+$+,#+&*
•! J"(9$&%)"2,F#,+%2$#,+%W0&,KLJ%
–! A.+*-",2+$b;* •! C+/+-#*/$+86%',*#-,1,*',6"4*#.+*(%$/',*4+"+$-#+&*
(LINE 44 in PhraseChunker.java) ',6"4*#.+*#26:+$*-"&*0-(+@%%1*9U!,*
int[] result = detector.sentPosDetect(content);
int start = 0, i = 0; •! A$GS*
do {
//sentence splitting –! F+"#+"(+*,+47+"#-)%"*
String sentence = content.substring(start, result[i]);
//TODO: tokenization, put tokens in a String array.
B-NP Rooney
–! A%1+"6,-)%"*
String[] tokens=tokeniser.tokenize(sentence);
B-VP
//TODO: pos tagging, put tags in a String array.
fails –! U-$#<%0<,/++(.*#-446"4*
I-VP to
String[] tags = tagger.tag(tokens);
I-VP on a list of tokens
//This is the method you use to chunk phrases
end –! U.$-,+*(.'"16"4*
//and a list of tags B-NP goal
String[] phrases = chunker.chunk(tokens,I-NP
tags); drought
//See the result O .
for(String p:phrases) Actual phrase: Rooney
System.out.println(p); Actual phrase: fails to end
…… Actual phrase: goal drought
start = result[i];
i++;
} while (start < result[result.length - 1]);
Z+>#* Q%7-6"*A+$7*C+(%4"6)%"*
•! !.%9,9/1$&%-"&%).,-&,-%9,*%&'-(9)-%#M0.(-9,-%-&(M$U% •! R.9/S*+>#$-(#*,#-),)(-33G*,64"6D(-"#*#+$7,O*2.6(.*
G&%D.//.G%-"&$&%$-&0$% (%33+()8+3G*&+#+$76"+*#.+*,'77-$G*%0*#.+*7-#(.*
–! Z-#'$-3*3-"4'-4+*-"-3G,+,*%0*+-(.*7+,,-4+* •! 8&)90T**&%7-6"*#+$7*$+(%4"6)%"*/$%(+&'$+*
H#%1+"6,-)%"O*UEF*#-446"4K* –! KLJ%0(.)&$$&$%#%*6&+")0G*(-"&6&-#+*3+>6(%",O*+;4;O*
–! !&+")0G*(-"&6&-#+*6"0%$7-)%"*'"6#,*%0*6"#+$+,#* "%'"</.$-,+,O*+"))+,*
H/.$-,+*(.'"16"4O*+")#G*$+(%4"6)%"K* –! C-9:$:)9/%M&9$2(&$%#%*+8-3'-#+*#.+*,64"6D(-"(+*%0*
(-"&6&-#+*3+>6(%",*
–! !&+")0G*,#-),)(-33G*67/%$#-"#*6"0%$7-)%"*H#+$7*
•! #+$7*0$+5'+"(G~*•<6&0~*2+6$&"+,,O*43%,,+>O*(<8-3'+O*
$+(%4"6)%"K* #+$7+>*