Ian recently published a blog post that he’d finally got around to writing, several months after a fierce internal debate at Talis about whether the Web of Data needs HTTP 303 redirects. I can top that. Ian’s post unleashed a flood of anti-303 sentiment that has prompted me to finish a blog post I started in February 2008.
Picture the scene: six geeks sit around a table in the bar of a Holiday Inn, somewhere in West London. It’s late, we’re drunk, and debating 303 redirects and the distinction between information and non-information resources. Three of the geeks exit stage left, leaving me to thrash it out with Dan and Damian. Some time shortly afterwards Dan calls me a “303 fascist”, presumably for advocating the use of HTTP 303 redirects when serving Linked Data, as per the W3C TAG’s finding on httpRange-14.
I never got to the bottom of Dan’s objection – technical? philosophical? historical? -Â but there is seemingly no end to the hand-wringing that we as a community seem willing to engage in about this issue.
Ian’s post lists nine objections to the 303 redirect pattern, most of which don’t stand up to closer scrutiny. Let’s take them one at a time:
1. it requires an extra round-trip to the server for every request
For whom is this an issue? Users? Data publishers? Both?
If it’s the former then the argument doesn’t wash. Consider a typical request for a Web page. The browser requests the page, the server sends the HTML document in response. (Wait, should that be “an HTML representation of the resource denoted by the URI“, or whatever? If we want to get picky about the terminology of Web architecture then lets start with the resource/representation minefield. I would bet hard cash that the typical Web user or developer is better able to understand the distinction between information resources and non-information resources than between resources and representations of resources).
In this context, is a 303 redirect and the resulting round-trip really an issue for users of the HTML interfaces to Linked Data applications? I doubt it.
Perhaps it’s an issue for data publishers. Perhaps those serving (or planning to serve) Linked Data are worried about whether their Web servers can handle the extra requests/second that 303s entail. If that’s the case, presumably the same data publishers insist that their development teams chuck all their CSS into a single stylesheet, in order to prevent any unnecessary requests stemming from using multiple stylesheets per HTML document. I doubt it.
My take home message is this: in the grand scheme of things, the extra round-trip stemming from a 303 redirect is of very little significance to users or data publishers. Eyal Oren raised the very valid question some time ago of whether 303s should be cached. Redefining this in the HTTP spec seems eminently sensible. So why hasn’t it happened? If just a fraction of the time spent debating 303s and IR vs. NIR was spent lobbying to get that change made then we would have some progress to report. Instead we just have hand-wringing and FUD for potential Linked Data adopters.
2. only one description can be linked from the toucan’s URI
Do people actually want to link to more than one description of a resource? Perhaps there are multiple documents on a site that describe the same thing, and it would be useful to point to them both. (Wait, we have a mechanism for that! It’s called a hypertext/hyperdata link). But maybe someone has two static files on the same server that are both equally valid descriptions of the same resource. Yes, in that case it would be useful to be able to point to both; so just create an RDF document that sits behind a 303 redirect and contains some rdfs:seeAlso statements to the more extensive description, or serve up your data from an RDF store that can pull out all statements describing the resource, and return them as one document.
I don’t buy the idea that people actually want to point to multiple descriptions apart from in the data itself. If there are other equivalent resources out there on the Web then state their equivalence, don’t just link to their descriptions. There may be 10 or 100 or 1000 equivalent resources referenced in an RDF document. 303 redirects make it very clear which is the authoritative description of a specific resource.
3. the user enters one URI into their browser and ends up at a different one, causing confusion when they want to reuse the URI of the toucan. Often they use the document URI by mistake.
OK, let’s break this issue down into two distinct scenarios. Job Public who wants to bookmark something, and Joe Developer who wants to hand-craft some RDF (using the correct URI to identify the toucan).
Again, I would bet hard cash that Joe Public doesn’t want to reuse the URI of the toucan in his bookmarks, emails, tweets etc. I would bet that he wants to reuse the URI of the document describing the toucan. No one sends emails saying “hey, read this toucan“. People say “hey, read this document about a toucan“. In this case it doesn’t matter one bit that the document URI is being used.
Things can get a bit more complicated in the Joe Developer scenario, and the awful URI pattern used in DBpedia, where it’s visually hard to notice the change from /resource to /data or /page, doesn’t help at all. So change it. Or agree to never use that pattern again. If documents describing things in DBpedia ended .rdf or .html would we even be having this debate?
Joe Developer also has to take a bit of responsibility for writing sensible RDF statements. Unfortunately, people like Ed seeming to conflate himself and his homepage (and his router and its admin console) don’t help with the general level of understanding. I’ve tried many times to explain to someone that I am not my homepage, and as far as I know I’ve never failed. In all this frantic debate about the 303 mechanism, let’s not abandon certain basic principles that just make sense.
I don’t think Ian was suggesting in his posts that he is his homepage, so let’s be very, very explicit about what we’re debating here — 303 redirects — and not muddy the waters by bringing other topics into the discussion.
4. its non-trivial to configure a web server to issue the correct redirect and only to do so for the things that are not information resources.
Ian claims this is non-trivial. Nor is running a Drupal installation. I know, it powers linkeddata.org, and maintaining it is a PITA. That doesn’t stop thousands of people doing it. Let’s be honest, very little in Web technology is trivial. Running a Web server in your basement isn’t trivial – that’s why people created wordpress.com, Flickr, MySpace, etc., bringing Web publishing to the masses, and why most of us would rather pay Web hosting companies to do the hard stuff for us. If people really see this configuration issue as a barrier then they should get on with implementing a server that makes it trivial, or teach people how to make the necessary configuration changes.
5. the server operator has to decide which resources are information resources and which are not without any precise guidance on how to distinguish the two (the official definition speaks of things whose “essential characteristics can be conveyed in a message”). I enumerate some examples here but it’s easy to get to the absurd.
The original guidance from the TAG stated that a 200 indicated an information resource, whereas a 303 could indicate any type of resource. If in doubt, use a 303 and redirect to a description of the resource. Simple.
6. it cannot be implemented using a static web server setup, i.e. one that serves static RDF documents
In this case hash URIs are more suitable anyway. This has always been the case.
7. it mixes layers of responsibility – there is information a user cannot know without making a network request and inspecting the metadata about the response to that request. When the web server ceases to exist then that information is lost.
Can’t this be resolved by adding additional triples to the document that describes the resource, stating the relationship between a resource and its description?
8. the 303 response can really only be used with things that aren’t information resources. You can’t serve up an information resource (such as a spreadsheet) and 303 redirect to metadata about the spreadsheet at the same time.
Metadata about an RDF document can be included in the document itself. Perhaps a more Web-friendly alternative to Excel could allow for richer embeddable metadata.
9. having to explain the reasoning behind using 303 redirects to mainstream web developers simply reinforces the perception that the semantic web is baroque and irrelevant to their needs.
I fail to see how Ian’s proposal, when taken as a whole package, is any less confusing.
Having written this post I’m wondering whether the time would have been better spent on something more productive, which is precisely how I feel about the topic in general. As geeks I think we love obsessing about getting things “right”, but at what cost? Ian’s main objection seems to be about the barriers we put in the way of Linked Data adoption. From my own experience there is no better barrier than uncertainty. Arguments about HTTP 303s are far more harmful than 303s themselves. Let’s put the niggles aside and get on with making Linked Data the great success we all want it to be.