Best practices for writing an API?

dlebauer · January 25, 2016, 6:43pm

We (developers of betydb.org) have an API that is used by the traits package. However, it was built in an ad-hoc fashion and we are revising it (rewriting it) to follow more standard conventions. Our first issue is to get from individual tables (https://github.com/PecanProject/bety/issues/381).

Are there a set of either ‘best practices’ for writing API’s or recommendations that would make it particularly easy to develop R packages such as traits around?

sckott · January 25, 2016, 7:39pm

Good question @dlebauer

Here goes my opinion (w/ caveat that I’ve only ever worked on one API)

Concur with BETY API continuing development: add v0 GET endpoints · Issue #381 · PecanProject/bety · GitHub about

use the right http verbs
using plural endpoints
use query params for optional filters

Additionally:

Fail well
- use the appropriate HTTP status codes, e.g., when someone tries a POST request against a route that only allows GET, then a 405 - Status Not Allowed is appropriate
- if you have error messages, put those in JSON response body, not a html-ized stack trace thing
Use gzip compression to make data sent over the wire smaller (maybe there’s better compression out there, not sure)
Use a changelog or news - and list changes in the API so that developers can quickly know what they need to change in their client/script (as opposed to having to just find out when their code breaks or looking through the API code itself)
Good docs (obviously)
If you allow geometry searches, WKT strings can get long fast, and you can run up against 414 HTTP errors, so allowing a POST request is good in those cases - though moot if you don’t do WKT geometry searches

Best practices for RESTful APIs will make it easy to develop against in R, and any other language

Some APIs I like (and I think are designed well):

GBIF GBIF REST API
Crossref https://github.com/CrossRef/rest-api-doc/blob/master/rest_api.mdhttps://github.com/CrossRef/rest-api-doc/blob/master/rest_api.md
Lagotto API @mfenner created this - he’s a good person to ask about APIs

mfenner · January 25, 2016, 7:48pm

I have recently started to use the JSONAPI spec and really like it how many of the things that go into API design are standardized, without creating a lot of overhead. A perfect example is error handling in JSONAPI.

I actually started a new API this weekend, very early version available since this morning at [https://api.labs.datacite.org] (https://api.labs.datacite.org). A lot of the ideas that Karl Ward used in the CrossRef REST API that @sckott mentioned, but using JSONAPI.

Is there an R client library for JSONAPI?

mfenner · January 25, 2016, 7:51pm

No R client mentioned on the JSONAPI implementations page…

For API documentation I use Swagger. And I learned today from Geoff Bilder that it is the foundation for the Open API Initiative standard.

sckott · January 25, 2016, 7:52pm

ha, I did start one GitHub - ropensci-archive/rjsonapi: ⛔ ARCHIVED Consumer for APIs that Follow the JSON API Specification - but not really useable yet

mfenner · January 25, 2016, 7:54pm

That makes it good fit for my not yet really useable server implementation. One other thing I like in JSONAPI besides error handling is pagination and previous/next links.

sckott · January 25, 2016, 8:03pm

Right, forgot about JSONAPI, but yeah, that’s something you could consider following @dlebauer - at least in part - there’s not a lot of real world usage exposed to the public AFAIK (at least last time I checked)

sckott · January 26, 2016, 5:52am

@dlebauer I just stumbled upon a good example of the need to fail well. Been using the IUCN Red List API, and noticed that if you pass an invalid API key they do return a message that the key was invalid, but they return a 200 HTTP response (i.e., a successful response, which it definitely is not). e.g.,

 curl -v http://apiv3.iucnredlist.org/api/v3/species/citation/loxodonta%20africana\?token\=asdfadf

*   Trying 176.58.126.20...
* Connected to apiv3.iucnredlist.org (127.0.0.1) port 80 (#0)
> GET /api/v3/species/citation/loxodonta%20africana?token=asdfadf HTTP/1.1
> Host: apiv3.iucnredlist.org
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.1.19
< Date: Tue, 26 Jan 2016 05:49:10 GMT
< Content-Type: application/json; charset=utf-8
< Content-Length: 30
< Connection: keep-alive
< X-Powered-By: Sails <sailsjs.org>
< Set-Cookie: sails.sid=s%3A3Jq0N87hJGY7lTyqWIeC308E.8BTMERywIyAzvxRhtPL6JTjBdt9Z4Dobh%2F87a8QMTp8; Path=/; HttpOnly
< Vary: Accept-Encoding
<
* Connection #0 to host apiv3.iucnredlist.org left intact
{"message":"Token not valid!"}

e.g., the GitHub API is really nice, and returns a 401 Unauthorized when the authentication is bad https://developer.github.com/v3/#client-errors

Anyway, the point is a naive client will proceed assuming everything is fine when a 200 response is received, but that’s not true

p.s., although, they were nice to put a meaningful error response in json

mbjones · January 26, 2016, 7:46am

@dlebauer I’ll raise two design issues. First, efficiency. If you follow
REST principles meticulously, then it can lead to a lot of API calls (e.g.,
one call to /things to list the collection, then 100,000 calls to
/things/thing17, /things/thing18, etc. to get some property or data of
interest for each object). When building dynamic web UIs, this can lead to
the need for many REST calls to collect even simple sets of information to
display in the UI. So, a good pattern is to allow the collections-level
API (e.g., /things) take some parameters so that properties of objects in
the collection can be returned in a batch operation (e.g., get the name,
size, and checksum of all things numbered between 37 and 10,337, with one
REST call). Your REST collections basically get turned into focused query
systems, and will need parameters for both field selection and query
criteria. REST purists detest parameterized calls because they aren’t
declarative and don’t correspond to the collections metaphor, but I think
they can be a huge gain for efficiency if used well. When writing the API,
ask yourself – how many calls to this API would a typical user make, and
design accordingly.

Second, follow REST design principles where it makes sense. The biggest
problem with many APIs is not being able to understand what it is you can
do with it, and I think the REST pattern helps clarify that. Think
carefully about just what your collections represent, and given them good,
pithy, descriptive names that people will understand. Don’t hide your
’users’ collection behind a ‘login’ REST URL – use ‘users’, or ‘accounts’.
Collections should be plural nouns. Objects in collections should be unique
identifying names. So, while I think there is a need to short-circuit REST
principles for efficiency at times, in general they are good guidelines.

Hope you find this commentary useful. If you put your proposed API up
somewhere before you implement it, I’ll bet you’ll get lots of good
commentary if you ask for it.

Matt

sckott · January 26, 2016, 4:53pm

Thanks for your thoughts @mbjones ! Agree on both points.

sckott · January 26, 2016, 11:23pm

submitted a pull request added R client rjsonapi to implementations/index.md by sckott · Pull Request #972 · json-api/json-api · GitHub Still could use some work - and may want to have as a subset of GitHub - sckott/request: http requests DSL for R eventually

knb · January 27, 2016, 1:31pm

REST-API designers- How will you decide if a user is logged in, and what to do in this case?
(Commonly, for logged-in users it might be necessary to respond with different -more- data than for unauthorized users.)

How do you plan to implement the login-process and the corresponding data-fetching procedure on the server side?

This might work for simple use cases: For every API call a sequence of switch statements, with different blocks of code, delivering different data to the clients. But for larger projects, or large user-bases with elaborate permission schemes, this might not work.

Would a model-view-controller architecture help in this case? What’s the proper design paradigm called these days?

petermeissner · January 27, 2016, 5:19pm

I am no expert but this sounds like the way to go: http://stackoverflow.com/a/14031061

Essential there shall be no login, sessions or such. requests shall be isolated.

mbjones · January 27, 2016, 7:43pm

That’s a great SO post. Thanks for the pointer. A minor nit – even with
HMAC, when authenticating there is always a login to establish credentials,
but the state of that login is encapsulated in the HMAC token, and so the
REST call is stateless, as all the information needed to validate the call
is in the token. We use just this type of approach with our DataONE auth
API, but we use JSON Web Tokens (http://jwt.io) rather than HMACs (that’s a
much longer and complicated discussion). You can now get a DataONE JWT
token for logging into the DataONE network (over 30 data repositories)
using our new user profiles that we announced today. Version 2 of our
’dataone’ R package supports these tokens and is being released this week.

Matt

Topic		Replies	Views
rOpenSci \| Why You Should (or Shouldn't) Build an API Client Blog	1	289	August 28, 2022
Best practices for testing API packages General Q&A	14	6902	December 4, 2020
new package id: youtube API Package Use Questions	2	721	March 21, 2019
Minimal Package Standards for the R Journal Package Development package	18	1335	February 27, 2020
Package covering two different APIs Package Development r , text-mining , api , package	2	604	May 26, 2020

Best practices for writing an API?

Related topics