A few weeks ago I blogged the forthcoming Cloudworks application programming interface and published a document for review. Today, I thought I would explain some of the decisions behind the design of the API and share some of the lessons we're learning. We'll also touch on Javascript widgets and possible next steps. And, we'll try not to "sell" REST or get into a holy war!

Cloudworks logo

Rationale

We were concerned about creating a usable, easy to understand URL-scheme. However, I also found that I prefer quite a strict REST approach. So, we spent some time oscillating between what we have today, and variations like "/api?method=cloud.getInfo&cloud_id=123". There was a feeling that the latter was almost self-documenting. I hope the result we came to is fairly understandable:

  /api/{item}/{term}[/{related}].{format}

In the end, our reasoning was that it is useful to have the things that are at the core of your data model, in our case clouds, cloudscapes, users and tags actually in the URL, for example, "/api/clouds/123/followers" (get the followers of a cloud). The optional parameters like "count", "orderby", or meta-data like an API key are expressed as GET parameters, eg. "?count=5&api_key=12345". The ability to express the output format like a file extension, eg. ".json" (or ".xml", soon) was taken from Twitter's API among others. The above ideas can make the pattern of the API calls more predictable for developers. They also make HTTP caching easier, again as URLs are more patterned and predictable - an added benefit. We made the late decision to use plurals throughout, eg. "/api/cloudscapes ...". This is to allow us to extend the API - we haven't yet implemented the call "/api/clouds" (get 'all' clouds, ordered by...), but we have made it easier to this add to the scheme.

We chose JSON as the first output format as it was requested by our first client, SocialLearn, and critically it is perhaps the most universal format, for both in-browser scripting, and server-side scripting. And, there is an, as yet undocumented, callback parameter to allow for JSON-P - required if you use for example the jQuery library. For example, "/api/clouds/123.json?callback=My.function2&api_key=...". I also decided later on to make the outer element of the JSON response always an object. So, "lists" or arrays of items are within an object. This gives greater uniformity and the ability to add more meta-data, while adding a little more complexity. Generally, the response is as simple as possible, while providing links to other parts of the API. An example is "tags" which changed from a simple array of tag-names.

"tags": [
    "OULDI",
    "Learning design"
] 

To an array of objects containing "api_urls",

"tags": [
    {"name":"OULDI", "api_url":"http://cloudworks..."},
    {...}
] 

One issue that caused some, perhaps unexpected, discussion was API keys. These are a common way of monitoring and if necessary controlling the use of an API by client software, typically as a GET parameter, eg. .../api/clouds/active?api_key=1234 . So far, so simple. There are at least four issues that have come out. One, how do make an API open and easy to use, while at the same time not risking overloading servers? We obviously don't have the vast server farms that Google and others have. Two, if you decide to use API keys, how do you keep them secure, especially for Javascript clients? Anyone can look in the client's source HTML and Javascript to see the key. Three, how do you write documentation and example code that gives an example API key, that actually works(!) - to make it easy to get started, while controlling the use of that key? Four, how do you not get bogged down in this and the issue of rate-limiting, and actually get something out there?! Our approach is to require the use of API keys at least initially, to log every API request, with IP address, user-agent string and so on, to not try to implement rate-limiting too early, and to monitor how things progress. And, we may have some answers to the Javascript question - more soon. We also have an idea to allow the API key to be put in the HTTP request headers, like YouTube/Google allow, and Wixi prefer. This may be better for HTTP caching - any feedback on this would be particularly welcome.

Lessons

These are some of the lessons I've learnt through the API work. These build on tips from others.

  1. Consider "feature flags" in place of a branch in your code repository (the Flickr developers). This isn't specific to APIs. The internationalization work I did earlier this year was completed on a branch, which left a minor merge head-ache at the end. For the API work I committed to 'head' and put flag-variables in the application configuration files. So, the code was on the live site before we flicked the switch. (Admittedly, this was easier because the API work touched less of the code.)
  2. Write test scripts early. (Well, I already knew I should write unit tests, but those can be tough, right!) I've picked up a number of bugs in the API by developing a test harness for all 24 calls. For example, I'm currently working on an XML output, as an alternative to JSON. This all looked really easy, until regression tests showed some calls failing - I was glad to pick up the bug early (the fix wasn't too difficult). And, the test script gave me a warm fuzzy feeling before we had a real API user (sad?!). The test script uses cURL and does some basic checks on the response, and some of the tests existed before the implementations. There is also a test of the Javascript widgets (more below).
  3. Handle errors properly. It makes using your API easier if you use the HTTP error codes, and use them correctly. Don't return a 200 for errors. The exception I found was for Javascript - if you return an error code the script won't run. (And yes, I've coded some error handling into the Javascript.) I also put PHP's error reporting level quite high and forced the display of errors through the API. I think this helped initially, though of course it's no good for production.
  4. KISS, keep it simple stupid - start small, both in terms of the number of calls, and by keeping the response simple. Ideally, start with calls based on one or more use-cases. We were perhaps fortunate to start with JSON, which after looking at YouTube's GData API I deliberately kept quite flat and simple. (YouTube's GData JSON format directly encodes Atom, so it contains multiple XML namespaces and so on.) My approach to XML is often to add multiple namespaces - not good for the first response format.
  5. Look at other APIs, use them.
  6. Ideally start with a stable database scheme and real content. These were two ways in which I think we were fortunate with Cloudworks. So far I've only had to make 3 or 4 changes to the data model. If you're developing an API for a new web site try to delay the API. A few weeks may make all the difference.
  7. Use the API yourself! It makes it easier if you have some use-cases in mind. The Javascript widgets we've been working on made me think about the consistency of the response. For more on how we're starting to use our API, look at 'Next Steps' below.
  8. Consider API keys and rate limiting early. As noted above there will be issues to resolve.
  9. Get it out there. That is why we haven't dealt with authentication, posting clouds and so on. We were concerned at times about the tight time schedule we've kept to and the potential for hasty decisions. However, on balance I'm glad we've got something out, in the open and it's starting to be used - thanks guys!

Next steps

We have a simple XML response format to match the existing JSON response in the pipeline. And we're working on some Javascript widgets for your blog or web-site, for example, to display the last 5 items from your cloudstream or the clouds associated with a tag. We are taking our cue from Delicious and Twitter which make this really simple for regular users. The idea will be to give every authenticated user a "Get Javascript embed code" button.

Looking further ahead, we'd like to tackle adding clouds and perhaps comments to the site through the API (this is an API after all, not a set of feeds ;). This inevitably means tackling authentication, possibly using OAuth. And we need to deal with paging of large responses. Looking at the Guardian's API explorer, we think that this would be a really useful way for developers to dip their toes in. So we'd like to do something similar. And performance and caching is on our radar, for the Cloudworks site as a whole and the API specifically. However, the site coped well with the recent OU conference, so this is less of a concern than it was. We will be converting the static PDF API document to Wiki pages when the opportunity arises.

Thank you to SocialLearn who funded the initial API development and the Cloudworks lead developer, Juliette who worked closely with me. Finally, I must say that we'd love you to use our API! Please, look at the document, create an account on Cloudworks and email us for an API key.

Useful links:

Comments

RESTfull apis

I think you are absolutely right to have used the URI path rather than query string. It's the way the web works best. I recently design the API for straight-street.com and came to that conclusion after reading Fielding's thesis and talking to Ian Boston of SAKAI at an OSS Watch event. I could have done with your notes then.

I went through some of the issue with keys you mention, and in the end we just have a open key for tracking purposes. Like you there are no updates yet. I also provided JSONP to get round that chuffing same domain policy (be so good when this goes away).

Due to my preferences and hosting limits it's a Python CGI script which is a little slow. I also provided a little example app and the API is now being used in a program which will be released soon by the Accessibility group at ECS.

Cheers Steve