HTTP APIs :: An Introduction
As developers we’ve been making HTTP APIs* almost since the moment HTTP started to see adoption; think of CGI scripts written in Perl or PHP (or don’t imagine it if you want to avoid the nightmare of those good-old-days). In that time we’ve learned an awful lot about improving the structure of our code and the associated, language-specific (e.g., Java, Ruby, Go) APIs. However, it doesn’t feel like we’ve made similar progress with our HTTP APIs. So, in this series of posts I want to take a deeper dive into various aspects of HTTP APIs and proffer some opinions on how we can do better.
“HTTP API” literally starts with HTTP yet it has been my experience that most developers don’t know much about the protocol. To them HTTP is “whatever this library does”. I think that is problematic for two main reasons. First, there are lots of libraries out there and they don’t always agree. If you don’t know the underlying protocol it can be hard to figure out how to resolve incompatibilities when they arise. Second, and if I may paraphrase Santayana, “Those who cannot remember HTTP are condemned to reinvent it”. HTTP provides a lot of functionality “out-of-the-box” to support many of the hard parts around API definition, evolution, and usage. A lot of developers seem to be unaware of these features and the lengthy thinking behind them and thus create their own, often less well considered, mechanisms. So, lets start by establishing some high level concepts/terminology of HTTP APIs.
Resources and Representations
The HTTP specification talks in terms of resources and representations. Resources are the “nouns” within a service. They may be binary blobs of data (e.g., image, audio, or video files) but in HTTP APIs it’s more common for them to be structured data (i.e., bags of key-value pairs) such as person, vehicle, or medical records. But whichever it is, from an HTTP perspective, what makes it a resource is that there is at least one, unique URL that can be used to address it.
In order to do something useful with a resource, in an HTTP API system, clients and servers need to be able to pass them back and forth. That obviously means that the data that constitutes the resource needs to be written to/read from the network. Representations are the different ways a resource could be encoded for transmission. If all the clients used the same programming language as the server, one representation might be whatever native serialization mechanism was supported by that language. However, most services have to assume both that clients could be written in any language and that representations will evolve over time. This is where language-agnostic formats like JSON, XML, CSV, avro, protobuf, etc. come in.
A very common mistake is to assume that the representation is or must be identical to the resource. This is not the case. As Alfred Korzybski said “The map is not the territory, the word is not the thing it describes.” Confusing resources and representations, or designing a service as if the two are equal, will lead to a lot of problems as the service evolves. I’ll cover these issues, and ways to avoid them, in my follow on post focused on representations.
ReST vs RPC
ReST and RPC are the most common patterns for HTTP APIs in use today. ReST (Representational State Transfer) is a declarative style API. The body of a request represents the state of a resource as the client wishes it to be while the body of a response is the state of a resource as currently held by the server. When the client sends the server a new state it is the server’s job to assess the current state of the resource, determine what needs to change to match the client-requested state, and perform those actions necessary to change the former into the latter or to return an error if such changes can not be made. This means that at their core ReST-based services are finite-state machines where the server contains most of the complexity.
Remote Procedure Call (RPC) services follow an imperative style where most of the complexity lies with the client. It must know the current state of the resource and about the particular remote procedures (a.k.a. methods, functions) that exist on the server. Then it must call the procedures in the order necessary to get the server into the final state the client wants. It must also deal with, to a greater degree, the concurrency issues that might arise if another client starts making its own set of procedure calls targeting the same resource at the same time. RPC requests contain the name of the procedure, usually in the URL, as well as its arguments, usually in the request body but sometimes as query parameters, while the response body contains the return value from the procedure call.
In my opinion, when deciding between ReST and RPC the main question is “where do we want to put the complexity”. In most cases, I think it best belongs on the server side (and so I think ReST is a good default option). The developers of the server are almost always associated with the organization that “owns” the resources and business logic that act upon them and so, in theory, know best how to implement that logic. Pushing a bunch of logic out to the clients seems like a recipe for highly inconsistent behavior and creates a much larger surface area for bugs to appear within.
The danger with this approach though is that a lot of developers don’t implement actual ReST services when they say they do. Again, ReST services require a lot of complexity on the server side and when most developers want to “go fast” they simply skip over that complexity; in many cases they are unaware of having done so. But regardless of how they got there the overall system ends up in a position where clients think the server is handling the complexity and the server is implemented as if the clients are handling it. Poor experiences and bugs ensue.
APIs vs Representations
When discussing an HTTP API it can be somewhat unclear to what “API” actually refers. Is it each individual endpoint (so a given service with 20 different URLs has 20 APIs) or is the API the collection of endpoints the server offers (so, just one API in the previous example)? I think the second approach is consistent both with how people talk about “a service’s API” and how developers version a codebase rather than individual classes or methods. Such an API covers the headers, paths, query parameters and resources offered by the service.
If endpoints might be thought of as analogous to method signatures within a programming language then representations would be analogous to the implementations of the object types passed in as arguments to those method. Just as the signature of a method doesn’t change when the implementation of one of its arguments changes, so too does an HTTP API remain unchanged when the representation of the resource it operates upon changes. That is, the API and representations are capable of evolving separately.
HTTP Versions
There are multiple versions of HTTP: 0.9, 1.0, 1.1, 2.0, and, soon, 3.0. Today you can pretty safely ignore 0.9 and 1.0 as you don’t see them out in the wild anymore. So, lets start with 1.1 which was originally defined by RFC 2616. When it came time to create HTTP/2, the authors wanted to keep the same semantics (e.g., the notion and behavior of headers, body, methods) as HTTP/1.1 but represent that data very differently on the network so they chose to take RFC 2616 and split it apart. This led to the “7230 series” of documents: RFCs 7230, 7231, 7232, 7233, 7234, and 7235. During this process they also tried to clarify a lot of meaning and incorporated the accumulated errata. The HTTP/2 and HTTP/3 standards were then able to refer to RFC 7231 - 7235 (the semantic parts of HTTP/1.1) while replacing the on-the-wire encoding now defined by RFC 7230. Or, saying it another way, HTTP/1.1 is now formally defined as RFC 7230 plus RFC 7231 - 7235 while HTTP/2 is defined as RFC 7540 plus RFC 7231 - 7235 and HTTP 3 will be this in-draft protocol plus RFC 7231 - 7235.
After reading all of that you may be thinking “Great. Thanks for the history lesson. Why should I care?” Because RFC 7230 (HTTP/1.1), RFC 7540 (HTTP/2) and QUIC HTTP (HTTP/3) can have very different performance and scaling characteristics and those characteristics and your needs around troubleshooting should inform both the protocol and representations you use.
HTTP/1.1 is a fully text-based protocol which means you could capture the data off the network and literally open it in a text editor and read the messages. That is great from a support standpoint and terrible from a network efficiency standpoint. Poor network efficiency means extra costs for the organization hosting the service and poorer performance for users. HTTP/1.1 also really sucks for transferring binary data. Because it’s text-based, any binary data needs to be converted into text (base64 encoded) which almost always increases the total amount of data sent over the network. An HTTP/1.1 connection also severely suffers from head-of-line-blocking issues. This in turn means clients will generally open more connections requiring further compute and network resources and thus further increasing the cost of running the service. These additional costs and added latency aren’t a big deal under low loads but they add up quickly as usage ramps up.
HTTP/2 made two substantial improvements to its network encoding. First, it moved to a binary message format. This means thing things like the request and status lines and headers are represented in a much more network efficient manner and allows for the efficient transport of binary request/response bodies. The second change they made was to move to a “multiplexed” communication method. Basically you can have many independent, virtual connections (called streams) inside one actual network connection. This helps, but doesn’t fully eliminate, head of line blocking in HTTP/2. Both of these make support more complex and thus costlier - you now need some special software to decode request/responses captured from the network.
HTTP/3 improved some of the binary encoding of messages but its big change was to move from TCP, used by all preceding HTTP versions, to a UDP protocol known as QUIC (hence why HTTP/3 is also called QUIC HTTP). This fully eliminates the head of line blocking problem, reduces the size of network packets, and addresses a bunch of overhead with TLS session establishment. However these changes further complicates troubleshooting.
When it comes to representation selection, your app should support both a text and binary format and use the former for HTTP/1.1 and the latter for HTTP/2 and HTTP/3. When using HTTP/2 with a good binary representation the total amount of network traffic will generally be 5-10%, sometimes less, of what it would have been in HTTP/1.1. The less data that needs to be sent the less time waiting for all the data to transit the network and the less latency experienced by the user. It also reduces the network and compute resource needed by both client and server.
What’s Next
So, all the above provides some context and terminology needed to discuss the particulars of the use of HTTP for APIs. I’ll have follow-on posts around:
- some parts of the HTTP protocol semantics that I think people often get wrong or just miss
- aspects of creating good HTTP APIs (i.e., the URLs, methods, and headers) for both ReST and RPC services
- creating representations that can evolve over time
- HTTP’s caching mechanism
- implementing long-running jobs
- some aspects of authentication and authorization
So, if any of this sounds good to you then stay tuned.
* Throughout these posts I use the term “HTTP API” rather than terms like microservice, web service, and other often used terms because many of them lack a formal definition either in fact or in practice. Because of that, people have generally made up a definition that bakes in a bunch of their own assumptions. “HTTP API” is hopefully different enough that it avoids those assumptions.