Arguments about HTTP 303 Considered Harmful

Ian recently published a blog post that he’d finally got around to writing, several months after a fierce internal debate at Talis about whether the Web of Data needs HTTP 303 redirects. I can top that. Ian’s post unleashed a flood of anti-303 sentiment that has prompted me to finish a blog post I started in February 2008.

Picture the scene: six geeks sit around a table in the bar of a Holiday Inn, somewhere in West London. It’s late, we’re drunk, and debating 303 redirects and the distinction between information and non-information resources. Three of the geeks exit stage left, leaving me to thrash it out with Dan and Damian. Some time shortly afterwards Dan calls me a “303 fascist”, presumably for advocating the use of HTTP 303 redirects when serving Linked Data, as per the W3C TAG’s finding on httpRange-14.

I never got to the bottom of Dan’s objection – technical? philosophical? historical? -  but there is seemingly no end to the hand-wringing that we as a community seem willing to engage in about this issue.

Ian’s post lists nine objections to the 303 redirect pattern, most of which don’t stand up to closer scrutiny. Let’s take them one at a time:

1. it requires an extra round-trip to the server for every request

For whom is this an issue? Users? Data publishers? Both?

If it’s the former then the argument doesn’t wash. Consider a typical request for a Web page. The browser requests the page, the server sends the HTML document in response. (Wait, should that be “an HTML representation of the resource denoted by the URI“, or whatever? If we want to get picky about the terminology of Web architecture then lets start with the resource/representation minefield. I would bet hard cash that the typical Web user or developer is better able to understand the distinction between information resources and non-information resources than between resources and representations of resources).

Anyway, back to our typical request for a Web page… The browser parses the HTML document, finds references to images and stylesheets hosted on the same server, and quite likely some JavaScript hosted elsewhere. Each of the former requires another request to the original server, while the latter triggers requests to other domains. In the worst case scenario these other domains aren’t in the client’s (or their ISP’s) DNS cache, thereby requiring a DNS lookup on the hostname and increasing the overall time cost of the request.

In this context, is a 303 redirect and the resulting round-trip really an issue for users of the HTML interfaces to Linked Data applications? I doubt it.

Perhaps it’s an issue for data publishers. Perhaps those serving (or planning to serve) Linked Data are worried about whether their Web servers can handle the extra requests/second that 303s entail. If that’s the case, presumably the same data publishers insist that their development teams chuck all their CSS into a single stylesheet, in order to prevent any unnecessary requests stemming from using multiple stylesheets per HTML document. I doubt it.

My take home message is this: in the grand scheme of things, the extra round-trip stemming from a 303 redirect is of very little significance to users or data publishers. Eyal Oren raised the very valid question some time ago of whether 303s should be cached. Redefining this in the HTTP spec seems eminently sensible. So why hasn’t it happened? If just a fraction of the time spent debating 303s and IR vs. NIR was spent lobbying to get that change made then we would have some progress to report. Instead we just have hand-wringing and FUD for potential Linked Data adopters.

2. only one description can be linked from the toucan’s URI

Do people actually want to link to more than one description of a resource? Perhaps there are multiple documents on a site that describe the same thing, and it would be useful to point to them both. (Wait, we have a mechanism for that! It’s called a hypertext/hyperdata link). But maybe someone has two static files on the same server that are both equally valid descriptions of the same resource. Yes, in that case it would be useful to be able to point to both; so just create an RDF document that sits behind a 303 redirect and contains some rdfs:seeAlso statements to the more extensive description, or serve up your data from an RDF store that can pull out all statements describing the resource, and return them as one document.

I don’t buy the idea that people actually want to point to multiple descriptions apart from in the data itself. If there are other equivalent resources out there on the Web then state their equivalence, don’t just link to their descriptions. There may be 10 or 100 or 1000 equivalent resources referenced in an RDF document. 303 redirects make it very clear which is the authoritative description of a specific resource.

3. the user enters one URI into their browser and ends up at a different one, causing confusion when they want to reuse the URI of the toucan. Often they use the document URI by mistake.

OK, let’s break this issue down into two distinct scenarios. Job Public who wants to bookmark something, and Joe Developer who wants to hand-craft some RDF (using the correct URI to identify the toucan).

Again, I would bet hard cash that Joe Public doesn’t want to reuse the URI of the toucan in his bookmarks, emails, tweets etc. I would bet that he wants to reuse the URI of the document describing the toucan. No one sends emails saying “hey, read this toucan“. People say “hey, read this document about a toucan“. In this case it doesn’t matter one bit that the document URI is being used.

Things can get a bit more complicated in the Joe Developer scenario, and the awful URI pattern used in DBpedia, where it’s visually hard to notice the change from /resource to /data or /page, doesn’t help at all. So change it. Or agree to never use that pattern again. If documents describing things in DBpedia ended .rdf or .html would we even be having this debate?

Joe Developer also has to take a bit of responsibility for writing sensible RDF statements. Unfortunately, people like Ed seeming to conflate himself and his homepage (and his router and its admin console) don’t help with the general level of understanding. I’ve tried many times to explain to someone that I am not my homepage, and as far as I know I’ve never failed. In all this frantic debate about the 303 mechanism, let’s not abandon certain basic principles that just make sense.

I don’t think Ian was suggesting in his posts that he is his homepage, so let’s be very, very explicit about what we’re debating here — 303 redirects — and not muddy the waters by bringing other topics into the discussion.

4. its non-trivial to configure a web server to issue the correct redirect and only to do so for the things that are not information resources.

Ian claims this is non-trivial. Nor is running a Drupal installation. I know, it powers linkeddata.org, and maintaining it is a PITA. That doesn’t stop thousands of people doing it. Let’s be honest, very little in Web technology is trivial. Running a Web server in your basement isn’t trivial – that’s why people created wordpress.com, Flickr, MySpace, etc., bringing Web publishing to the masses, and why most of us would rather pay Web hosting companies to do the hard stuff for us. If people really see this configuration issue as a barrier then they should get on with implementing a server that makes it trivial, or teach people how to make the necessary configuration changes.

5. the server operator has to decide which resources are information resources and which are not without any precise guidance on how to distinguish the two (the official definition speaks of things whose “essential characteristics can be conveyed in a message”). I enumerate some examples here but it’s easy to get to the absurd.

The original guidance from the TAG stated that a 200 indicated an information resource, whereas a 303 could indicate any type of resource. If in doubt, use a 303 and redirect to a description of the resource. Simple.

6. it cannot be implemented using a static web server setup, i.e. one that serves static RDF documents

In this case hash URIs are more suitable anyway. This has always been the case.

7. it mixes layers of responsibility – there is information a user cannot know without making a network request and inspecting the metadata about the response to that request. When the web server ceases to exist then that information is lost.

Can’t this be resolved by adding additional triples to the document that describes the resource, stating the relationship between a resource and its description?

8. the 303 response can really only be used with things that aren’t information resources. You can’t serve up an information resource (such as a spreadsheet) and 303 redirect to metadata about the spreadsheet at the same time.

Metadata about an RDF document can be included in the document itself. Perhaps a more Web-friendly alternative to Excel could allow for richer embeddable metadata.

9. having to explain the reasoning behind using 303 redirects to mainstream web developers simply reinforces the perception that the semantic web is baroque and irrelevant to their needs.

I fail to see how Ian’s proposal, when taken as a whole package, is any less confusing.

~~~

Having written this post I’m wondering whether the time would have been better spent on something more productive, which is precisely how I feel about the topic in general. As geeks I think we love obsessing about getting things “right”, but at what cost? Ian’s main objection seems to be about the barriers we put in the way of Linked Data adoption. From my own experience there is no better barrier than uncertainty. Arguments about HTTP 303s are far more harmful than 303s themselves. Let’s put the niggles aside and get on with making Linked Data the great success we all want it to be.

10 Responses to “Arguments about HTTP 303 Considered Harmful”


  1. Dave Reynolds

    In my experience of having many conversations derailed by this issue the killers are #3, #4 and #9.

    #3 It is very easy to follow links in any linked data explorer and to find the resource you want then accidentally use the doc instead of id URI to reference that resource in some other dataset. I’ve lost count of the number of times I’ve seen the result of that in data sets (and have done it myself sometimes even though I’m supposed to know better).

    #4 Is a show shopper for people who have insufficient access to their web servers, e.g. many people in a corporate environment, local authority or with a low end hosting provider. In contrast the latter can install a usable Drupal at one click of a button for most hosting environments.

    #9 I have some sympathy that the alternative which has emerged doesn’t help that much. Having to work magic over a content-location header is not *that* much easier to explain or do.

    Personally I’m happy with “let them use #”.

    But then personally I also think the need to make the distinction in the first place is overrated. Philosophically important but little practical consequence. I’ve never, in many years of this stuff, ever had occasion to want to attach metadata to the RDF source page as opposed to the concept (except maybe in separate provenance graphs where there is no ambiguity).

    Dave

  2. Leigh Dodds

    Hi Tom,

    You’ve given a good run-down of why you disagree with Ian’s assessment of the disadvantages of 303 and also stated the valid concern that others have raised: that any discussion of topic may harm adoption of Linked Data.

    I won’t address your specific points, although I don’t agree with all of your assessments ;). It’s the second aspect I wanted to pick up on.

    It always concerns me when the “don’t go there” argument is raised on any technical issue as it smacks a little of inflexibility. It’s important to understand that the two currently favoured approaches to Linked Data do have their respective trade-offs.

    I think the discussion has been useful to tease out several of those trade-offs which really haven’t been itemised anywhere. The Cool URIs document is good, but not perfect, as a description of best practices.

    If we didn’t discuss those trade-offs then we wouldn’t be able to refine not only our understanding, but that of others in the (growing) community. It’s the essence of learning and iterative development. Permathreads are the outward signs of that happening within a community: it is often new members that trigger these debates.

    My feeling is that, as with anything on the web, we do need to be ready to re-evaluate technical recommendations and best practices. I see Ian’s proposal and the resulting discussion as a positive sign that the community is willing and able to do that in what has been a largely productive fashion.

    If we recognise that there *are* trade-offs in the different approaches, then it also seems reasonable to explore the alternatives. You haven’t actually touched on whether you think there are advantages or disadvantages to Ian’s proposal.

    My take-away from the discussion is that not only does it seem to have its own set of merits, but that it doesn’t seem to harm or impact any of the existing semantic web tools/applications. So it’s reasonable to consider it as an additional mechanism, especially as the IR/NIR distinction remains. The fact that it doesn’t seem to have any major impacts does make it harder to defend the 303 redirect mechanism — where is the benefit of that, if everything else (IR/NIR) remains.

    As you rightly point out, web developers have to habitually make a design trade-offs whenever they build or deploy an application, so I doubt whether more choice will really be an issue.

    So, is this the kind of thing that the community should be debating? My feeling is that yes it is. There may well be other areas that need further attention to help spread adoption, but this is one issue that more than one person has bumped into, so worth the discussion in my book.

  3. Tom Heath

    @Dave: by “RDF source page” do you mean the RDF document describing the thing? If so then what about e.g. licensing information?

    @Leigh: I genuinely welcome discussion where it leads to genuine progress, including the definition of new best practices and standardisation where appropriate, and a migration path for those using following previous best practices. What I find frustrating is argument for the sake of it. If the community, spurred on by Ian’s posts, can achieve the former then great. If not then the whole episode will remain another example of the latter.

  4. Ed Summers

    As a friend of mine commented on reading some of this httpRange-14 debate:

    “Feels like some sort of philosophical debate which should start over espresso and finish over wine when everyone’s agreed that the universe doesn’t exist.”

    If y’all got drunk and had fun talking about it I think 303 should be considered Helpful :-)

  5. Christopher Gutteridge

    It’s worth checking your assumptions every few years, but not worth starting a vi/emacs style war.

    I’m more concerned about the fact that you can ask a human for a URL to their homepage but not a URI to identify themselves. It doesn’t help that URL and URI are very similar TLAs and a URI looks like a URL and (often) returns HTML when resolved by a user.

    For example, how would wordpress ask someone for their URI so they could link identities of posters. (actually you could use the mailbox_sha so that’s a bad example… actually that’d be neat…)

    The best I’ve come up with is to show the URI in a text field, like the ‘embed’ html in youtube:
    http://eprints.ecs.soton.ac.uk/21687/ (see the URI/RDF button)

    I think that you need to recognise that the community is going to expand massively very soon and we need to be tolerant of a lot of naive questions.

    We might do better dropping “URL” and “URI” and finding more evocative names for them.

  6. Kingsley Idehen

    Bar initial mutual exclusion tone (re. existing 303 practices), I am of the opinion that Ian has unveiled an additional option that is dog-food heavy re. Linked Data.

    If we (as a community) espouse self-describing data, then we should demonstrate how self-description resolves ambiguity when dealing with a specific type of resolvable HTTP identifiers that are slash terminated.

    This option (like others before it) has pros and cons just like all the other options. Tweaking subject heading of initial post, and making the “option orientation” much clearer would have brought clarity to all of this much sooner. Basically, its distraction factor would have become non existent (IMHO).

    To conclude. It’s a nice additional option. Ultimately, implementors have to choose what works best for their use cases etc..

    Ian’s option breaks nothing. All options have to start by doing just that: break nothing.

    Kingsley

  7. Tom Heath

    @Ed: You have a point! Perhaps we can have a beer before too long and even talk about something else :)

    @Christopher: Re “I think that you need to recognise that the community is going to expand massively very soon and we need to be tolerant of a lot of naive questions.” Fantastic! I love asking (and trying to answer) naive questions, which in my experience often turn out to not be naive at all. What matters to me is that when the questions come, we, as a community, have a consistent response. These sorts of discussions trade consistency for the hope of a better solution. I’m not sure that’s a good bet.

  8. Scott Banwart's Blog » Blog Archive » Distributed Weekly 76

    [...] Arguments about HTTP 303 Considered Harmful [...]

  9. Jonathan Rochkind

    We often need to make assertions about things for which we don’t get to decide — or people don’t agree — which is the “authoritative” description.

    Perhaps I notice this most, because of the domain I work in — libraries. Like the kind that have lots of books (and increasingly lots of ebooks too). I want to describe things that the publisher (arguably the one ‘authorized’ to make an ‘authoritative’ description) has not provided any web representation. Or the publisher doesn’t even exist anymore, it’s an old book. Or the publishers web representation is crappy and not linked data at all. But I still need to create, maintain, and share data about these things.

    There are LOTS of domains that people want to create and share data about that don’t have obvious ‘authoritative representations’, only if you stick to mostly talking about things originally created on the web in the first place do you avoid this problem.

    One of the selling points of linked data as described to me is that you don’t NEED an “authoritative” description. You can take assertions from several different sources you choose to trust, and combine them into a description.

    If you have to assume an ‘authoratative description’ for a given thing in order to make httpRange-14/303 seem workable, as you do explicitly in point 2 and I think is implicitly behind your analysis in some of the other points too, that seems like giving up the baby to save the bathwater.

  10. Proposed changes in VIAF RDF « Jakoblog — Das Weblog von Jakob Voß

    [...] And the Webpage that gives you information about the person can also get the same URI (see this article for a good defense of the HTTP-303 mess). Sure Semantic Web purists, which still dream of hard artificial intelligence, will disagree. But [...]