TypePad Motion is a Django web application for creating community microblogging sites. It uses the TypePad API as a backend data store, so you don't have to worry about scaling, backups, or any related infrastructure. All you need is a web server.
21. httplib2
• Developed by Joe Gregorio
• Sports a number of features that are useful
for talking to web services
• Caching
• Compression
• Extensible authentication mechanism
24. remoteobjects
• We work a lot with RESTish web services at Six Apart
• Mark Paschal (@markpasc) created an abstraction layer
that allows us to create robust client libraries very
easily
• The remoteobjects library allows you to describe an
API using a declarative syntax, like Django models
• Works very well with RESTful APIs, but it’s generic
enough to work with RPC APIs too
25. remoteobjects
class User(RemoteObject):
id = fields.Field()
name = fields.Field()
screen_name = fields.Field()
[ ... ]
followers_count = fields.Field()
status = fields.Object(‘status’)
@classmethod
def get_user(cls, http=None, **kwargs):
url = ‘/usrs/show’
if ‘id’ in kwargs:
url += ‘/%s.json’ % quote_plus(kwargs.pop(‘id’))
else:
url += ‘.json’
query = urlencode(kwargs)
url = urlunsplit((None, None, url, query, None))
return cls.get(urljoin(Twitter.endpoint, url),
http=http)
26. remoteobjects
>>> t = twitter.Twitter()
>>> t.add_credentials('mjmalone', 'thisisnotmypassword')
>>> friends_timeline = t.friends_timeline()
>>> [e.user.screen_name for e in friends.entries]
['ronaldheft', 'joestump', 'donttrythis',
'laughingsquid', 'FrankGruber', 'courtstarr', 'mbaratz',
'djchall', 'dozba', 'chrismessina', 'pop17', 'ijustine',
'calden', 'bryanveloso', 'jessenoller', 'pierre',
'optimarcusprime', 'snackfight', 'shiralazar']
29. What?
• A mechanism for retrieving several distinct
resources in a single HTTP request.
30. TypePad API URLs
• The TypePad API is a REST API
• Each distinct resource has its own URL
which supports some subset of the
operations GET, PUT, POST and DELETE.
37. Motion’s Home Page
/users/apparentlymart
/users/apparentlymart/relationships/@following
/users/apparentlymart/relationships/@follower
Requires 5 resources
from api.typepad.com
/groups/{id}/events /groups/{id}/memberships
38. How can we retrieve
those five resource
from TypePad as quickly
and efficiently as
possible?
39. Design Goals
• Retrieve all required resources in parallel
• Allow TypePad to optimize the multi-get to
return a response as quickly as possible
• Be consistent with how single resources
are returned in the normal case
• Invent as little as possible
40. Parallel HTTP Requests
• Reasonably easy from the client’s
perspective, assuming that their HTTP
client library doesn’t suck.
• Difficult to optimize predictably on
TypePad’s end because we can’t control the
order and timing of the handling of
individual requests.
41. HTTP Pipelining
• Poor support from libraries and
infrastructure.
• Responses must be returned in order of
request, which complicates the API
dispatcher.
44. Batch Processing API
• GData (ab?)uses Atom to define a list of
operations.
• AWS uses some AWS-specific query string
arguments or SOAP-ish stuff.
• Ideally we wanted batch processing to be
an orthogonal layer.
45. Batch Processing API
• Long story short: we designed a batch
processing mechanism that can hopefully
apply to any REST-based API.
46. Batch Processing API
• Batch Processor is conceptually similar to
an HTTP proxy.
• The frontend accepts a POST request
containing a MIME multipart message
containing a bunch of requests, and returns
a multipart message containing the
corresponding response.
47. Batch Processing API
• Conceptually, the backend of the batch
processor makes its own HTTP requests to
the indicated resources like an HTTP proxy
would.
• In practice, TypePad’s implementation
handles the entire batch job together and
fakes up the separate responses.
48. Batch Processing API
• We’ve published a spec and some open
source sample implementations in the hope
that it’s useful to the community:
http://martin.atkins.me.uk/specs/batchhttp
http://github.com/sixapart/batchhttp
http://github.com/sixapart/libhttp-request-multi-perl
- The bottom layer is remoteobjects & batchhttp, we’ll talk about those later.
- The Python TypePad API library is a pure Python client library for talking to the TypePad API
- typepadapp is a Django application that provides basic functionality that’s generally useful for Django apps built on top of TypePad like user authentication, OAuth token management, session management, and session synchronization
- typepad-motion is really just a thin layer on top of these building blocks - it’s a set of views, URLConfs, templates, and some static resources like CSS, images, and javascript
So, to simplify things to a stupid degree, the Motion architecture looks something like this.
PyPi, pip, distribute, setuptools, etc. help a lot. We could decompose Motion into separate apps and maintain a good design, but still make it really easy to install. It also let us leverage outside code like httplib2 without having to include all of that stuff in our distributions.
We’re using a custom management command to install our own skeleton files for a typepad project. You don’t have to use it, but it will set up sane defaults for all the Motion settings. Obscure settings are in motion.settings, so our default settings.py imports * from there.
Turns out Django&#x2019;s Management commands in Django really easily. Django iterates over your installed apps and looks in management/commands/<commandname>.py for a class called Command that subclasses management.base.BaseCommand. So once you have a project it&#x2019;s easy to add commands. It&#x2019;s a bit trickier for django-admin.py because you don&#x2019;t have any installed apps by default. But you can specify a settings file to django-admin.py and it&#x2019;ll load those settings before it tries to find the command. Score!
Automatically creates directory structure for theming & static resources, creates urlpattern for static media. Stupid Django 1.1.1 &#x201C;security patch&#x201D; broke this stuff though, fixed in Motion trunk, but not in the PyPi version.
All of our templates just extend motion/base/<template>.html. Thus, you can create a template _with the same name_, also extend motion/base/<template>.html, and override specific blocks. This is basically an alternative to letting you create templates that have different names and then pass them in as kwargs in your urls.py.
We decomposed our templates a lot to make them more extensible. Problem is, whenever a template is rendered Django has to read it from disk, lex it, parse it, and then render it. It doesn&#x2019;t do any caching. This was taking around 1ms per template. We were rendering hundreds of templates. It adds up.
All Django requires for a &#x201C;view&#x201D; function is a &#x201C;callable&#x201D; that returns an HttpResponse. You can use a function to do this, but you can also use classes. We decided to use classes so we could provide hooks that make it easier to extend Motion - like batch HTTP stuff.
Here&#x2019;s an example of a RemoteObject class representing a Twitter user. The code for this comes with the remoteobjects package, by the way. There are a number of other examples that come with the package as well.
- For context I&#x2019;ll just summarize the resource model for the TypePad API
Data API URLs always start with a noun and an id
A bare URL like this retrieves a single object
A third path level selects a sub-resource of a top-level object
Many of these are lists
List resources tend to support filters which constrain what items are returned.
There are both parameterized filters...
...and boolean filters.
and in some cases multiple filters can be combined together.
Every page in Motion is constructed predominently from data retrieved from TypePad via the API.
We need to get all of the required data out of TypePad as quickly as possible.
At first, we just used parallel HTTP requests, which is pretty straightforward.
This proved inefficient within TypePad&#x2019;s infrastructure since we ended up with each request in its own Apache process.
While other architectures are possible, completely reinventing our stack would be time-consuming.
We briefly considered HTTP Pipelining as a purist solution,
but in practice we found various problems.
- Poor client library support
- Poor infrastructure support (our load balancers don&#x2019;t support pipelining)
- Implementation would&#x2019;ve been complicated due to having to maintain the request order.
We eventually settled on having a special endpoint for submitting batch requests.
The client submits a single request to the batch processor which contains a description of all of the constituent requests.
The individual requests get handled, and are returned together as a single HTTP response.
But we need to figure out what a description of a batch job looks like on the wire...
We looked at some prior art here, but everything seemed to be specific to the API it was wrapping.
Figuring that batch processing ought to be transparent, we wanted something that could exist as a separate infrastructure layer.
Think of it as an HTTP proxy with a funny-looking frontend.
We use a multipart MIME message with each part containing a HTTP message as our wire format.
This allows HTTP messages to be wrapped with minimal overhead.
Although our internal implementation actually does something more clever, the protocol presents the illusion that each request is being handled independently as if it were a proxied request to a data endpoint.
We think this approach ought to work for other REST-based APIs, so we made a point of keeping it generic and we&#x2019;ve released a draft specification and some sample implementations, including a Python client library which works with httplib2 and a Twisted-based proxy that can be used to put a batch processor frontend in front of some existing endpoints.