3. Giving a real-t ime geo talk at
@where 20. How do you build stuff?
#rtgeo.
19 Apr via Twitter for iPhone
from Santa Clara Convention Center
50 01 Great America Parkway
Santa Clara, CA 95054
View Tweets at this place
4. Background [] raffi@
wherehoo
wherehoo
~/: cat /
etc/servi
5859/udp
ces | gre
# WHEREHO
p whereho
o
5859/tcp O
Wherehoo (2000) # WHEREHO
O
⇢ “The Stuff Around You”
⇢ “Wherehoo Server: An interactive location service for software agents and
intelligent systems” - J.Youll, R.Krikorian
⇢ In your /etc/services file!
BusRadio (2004)
⇢ Designed mobile computers to play media while also transmitting telemetry
⇢ Looked and sounded like a radio - but really a Linux computer
OneHop (2007)
⇢ Bluetooth proximity-based social networking
5. Background
Twitter
⇢Originally tech lead of API / Platform team
⇢Built the first geo-based infrastructure before acquisition of
Mixer Labs in December of 2009
⇢Now lead of the Application Services group
⇢Runs five teams focused on scalable infrastructure around
“core” data objects
⇢Tweets, users, timelines, places, etc.
⇢Delivery, authentication, APIs, etc.
6.
7. Table of contents
Background
⇢ Why are we interested in this?
Twitter’s geo APIs
⇢ How do we allow people to talk about place?
⇢ Context around “place”
Problem statement
⇢ What do we want our system to do?
Infrastructure
⇢ How is Twitter solving this problem?
15. Original attempts
Adding it to the tweet
⇢ Use myloc.me, et. al. to add text to the tweet
⇢ Puts location “in band”
⇢ Takes from the 140 characters
Setting profile level locations
⇢ Set the user/location of a Twitter user
⇢ There’s an API for that!
⇢ Not a per-tweet basis
⇢ Not intended for high frequency alterations
19. Geotagging API
Adding it to the tweet
⇢ Per-tweet basis
⇢ Out of band and pure metadata
⇢ Does not take from the 140 characters
Native Twitter support
⇢ Simple way to update status with location data
⇢ Ability to remove geotags from your tweets en masse
⇢ Using GeoRSS and GeoJSON as the encoding format
⇢ Across all Twitter APIs (REST, Search, and Streaming)
21. geocode
“latitud parame
Search e,longit
radius h ude,rad
as units
ter take
s
ius” wh
of mi or ere
km
[] raffi@~/: curl "http://search.twitter.com/search.atom?
geocode=40.757929%2C-73.985506%2C25km&source=foursquare"
...
<title>On the way to ace now, so whenever you can make it I'll be
there. (@ Port Imperial Ferry in Weehawken) http://4sq.com/
2rq0vO</title>
...
<twitter:geo>
<georss:point>40.7759 -74.0129</georss:point>
</twitter:geo>
...
27. location filtering
[] raffi@~/: curl "http://stream.twitter.com/1/statuses/filter.xml?
locations=-74.5129,40.2759,-73.5019,41.2759"
locations is a b
ounding box s
“long1,lat1,lon pecified by
g2,lat2” and ca
to 10 location n track up
s that are mos
square (~60 m t 1 degree
iles square an
to cover most d enough
metropolitan
areas)
31. Trends API
Global Trends
⇢Analysis of “hot conversations”
⇢Does not take from the 140 characters
Location specific trends
⇢Tweets being localized through a variety of means internally
⇢Locations exposed over the API as WOEIDs and Twitter IDs
⇢Can ask for available trends sorted by distnace
32. available locations
[] raffi@~/: curl "http://api.twitter.com/1/trends/available.xml"
<locations type=”array”>
<location>
<woeid>2487956</woeid>
<name>San Francisco</name>
<placeTypeName code=”7”>Town</placeTypeName>
<country type=”Country” code=”US”>United States</country>
<url>http://where.yahooapis.com/v1/place/2487956</url>
ke a lat and long
nally ta
</location>
C an optio trends
to have
...
parameter ted, as
ed, sor
</locations>
location s return
dista nce from you.
33. Look up a tren
a Local trend WOEID
d at a given
[] raffi@~/: curl "http://api.twitter.com/1/trends/2487956.xml"
<matching_trends type=”array”>
<trends as_of=”2009-12-15T20:19:09Z”>
...
<trend url=”http://search.twitter.com/search?q=Golden+Globe
+nominations” query=”Golden+Globe+nominations”>Golden Globe nominations</
trend>
<trend url=”http://search.twitter.com/search?q=%23somethingaintright”
query=”%23somethingaintright”>#somethingaintright</trend>
...
</trends>
</matching_trends>
35. A place is a name
5001 Great America Parkway, Santa Clara, CA 95054
Great America Parkway and Tasman Drive
The Bay Area
Santa Clara convention center
Twitter ID 3b7dd0d93e661e18
37. Sharing coordinates
More aptly named “geotagging”
Good for sharing photos
Possibly good for talking about a specific place
(e.g. store, restaurant)
People don’t understand numbers and without
a map, there is a lack of context
Huge privacy implications
38. Sharing polygons
Privacy implications are
potentially better
If you thought sharing one pair
of numbers was bad...
Questions around polygon
definition
Still unable to visualize unless
on a map
39. Sharing names
Has the potential to make a connection with users
Distinguishes a “named place” from simply a “place”
Inverse relationship between granularity and connection
Rather large internationalization / context implications
41. Geo-place API
Support for “names”
⇢Not just coordinates
⇢More contextually relevant
⇢Positive privacy benefits
Increased comlexity
⇢Need to be able to look up a list of places
⇢Requires a “reverse geocoder”
⇢Human driven tagging and not possible to be fully automatic
46. what do we need to build?
Database of places
⇢Given a real-world location, find places
⇢Spatial search
Method to store places with content
⇢Per user basis
⇢Per tweet basis
48. as background... MySQL + GIS
Ability to index points and do a spatial query
⇢For example, get points within a bounding rectangle
⇢SELECT MBRContains(GeomFromText(‘Polygon(0 0, 0
3, 3 3, 3 0, 0 0))’), coord) FROM geometry
Hard to cache the spatial query
Possibly requires a DB hit on every query
49. options
Grid / quad-tree
⇢ Create a grid (possibly nested) of the entire Earth
Geohash
⇢ Arbitrarily precise and hierarhical spatial data reference
Space filling curves
⇢ Mapping 2D space into 1D while preserving locality
R-Tree
⇢ Spatial access data structure
53. geohash
37o18’N 121o54’W = 9q9k4
Hierarchical spatial data structure
Precision encoded
Distance captured
⇢Nearby places (usually) share the same prefix
⇢The longer the string match, the closer the places are
55. Geohash
Possible to do range query in database
⇢Matching based on prefix will return all the points that fit in
the “grid”
⇢Able to store 2D data in a 1D space
57. Space filling curve
Generalization of geohash
⇢2D to 1D mapping
⇢Nearness is captured
Recurisvely can fill up space
depending on resolution required
Fractal-like pattern can be used
to take up as much room as
possiblE
62. How do you store precision?
“Precision” is a hard thing to encode
Accuracy can be encoded with an error radius
Twitter opts for tracking the number of decimals passed
⇢140.0 != 140.00
⇢DecimalTrackingFloat
63.
64.
65.
66. Twitter infrastructure
Ruby on Rails-ish frontend
Scala-based services backend
MySQL and soon to be Cassandra as the store
RPC to back-end or put items into queues
67.
68. Simplified architecture
R-Tree for spatial lookup
⇢Data provider for front-end lookups
⇢Store place object with envelope of place in R-Tree
Mapping from ID to place object
69. Java Toplogy Suite (JTS)
http://www.vividsolutions.com/jts/jtshome.htm
Open source
Good for representing and manipulating “geometries”
Has support for fundamental geometric operations
⇢ contains
⇢ envelope
Has a R-Tree implementation
70. pointI
nside
pointO in pol
utside ygon?
in pol true
ygon?
false
71. at (0.
0, 0.0
-- reg )
at (1. ion 1
0, 1.0
-- reg )
ion 1
-- reg
at (2. ion 2
0, 2.0
-- reg )
ion 1
-- reg
at (3. ion 2
0, 3.0
-- reg )
at (4. ion 2
0, 4.0
-- emp )
ty
72. Java Topology Suite (JTS)
Serializers and deserializers
⇢Well-known text (WKT)
⇢Well-known binary (WKB)
⇢No GeoRSS or GeoJSON support
73. interface / RPC
RockDove is a backend service
⇢Data provider for front-end lookups
⇢Uses some form of RPC (Thrift, Avro, etc.) to communicate
with
⇢Data could be cached on frontend to prevent lookups
Simple RPC interface
⇢get(id)
⇢containedWithin(lat, long)
74.
75. Interface / RPC
Watch those RPC queues!
Fail fast and potentially throw “over capacity” messages
⇢get(id) throws OverCapacity
⇢containedWithin(lat, long) throws OverCapacity
Distinguish between write path and read path
80. Triangulation: Cellular
200m to 1km accuracy
Measuring signal strength to cell towers with known locations
If can only see one cellular tower, then fallback to cellular tower
identification - better than nothing, but really inaccurate
Requires cellular modem, software, and lookups
81. Triangulation: Wifi
Sub 20m accuracy
Works indoors and in urban areas
Doesn’t need dedicated hardware just a 802.11 radio
Relatively quick time to get a position
82. Triangulation: GPS
Sub 1m accuracy
Need dedicated GPS hardware
Prone to multi-path confusion especially in cities
Needs line of sight to the sky
Doesn’t work well indoors
Potentially takes a few minutes to get a lock
83. Association
IP address to geographical mapping
All done on the server side
Maybe “good” for city level
⇢ Maxmind has 83% at 40km
⇢ Very error prone
⇢ Gets wonky when dealing with cellular
connections or rather large ISPs
Database needs to be refreshed fairly
frequently
84. Extraction
Read the text and understand intent
Hard to understand whether talking
from
a place, or about a place
Running text through a geocoder
(Google, Yahoo, Geocoder.us)
Parsing structured URLs and then
crawling “place pages”
85. location in browser
Geolocation API Specification for JavaScript
navigator.geolocation.getCurrentPosition
Does a callback with a position object
position.coords has
⇢ latitude and longitude
⇢ accuracy
⇢ other stuff
Support in Firefox 3.5, Chrome 5, Opera 10.6, and others with Google Gears