Archive Page 2

I’m not a lawyer, but…. ISWC2009 Tutorial on Data Licensing

As the Linked Data community shifts its emphasis from publishing data on the Web to consuming it in applications, one question inevitably arises: “what are the terms under which different data sets can be reused?” There’s been a considerable amount of time and money invested by Talis and others in providing some clarity in this area; work that has evolved into the Open Data Commons. However, there remains a lot of confusion in this area, as evidenced by this thread on the Linking Open Data mailing list. Clearly there is more education and outreach work to be done about how licenses and waivers can be applied to data.

With this in mind Leigh Dodds, Jordan Hatcher, Kaitlin Thaney and I submitted a tutorial proposal to this years International Semantic Web Conference addressing exactly these kind of issues. The good news is that our proposal has been accepted, and therefore there will be a half day tutorial on “Legal and Social Frameworks for Sharing Data on the Web” at ISWC2009 in Washington DC in October (more details online soon). With Jordan present to provide the legal perspective there’ll finally be someone taking part in the discussion who can’t prefix their statements with “I’m not a lawyer, but…”

The Semantic Web is the Cake…but the Technologies are not the Layers

My last post about the relationship between Linked Data, the Semantic Web and the Semantic Web technology stack seemed to create more debate and disagreement than clarity. Not to be discouraged by this, I’ve been giving some more thought to analogies that may help to illuminate the the relationship between different concepts in the Semantic Web space. This got me thinking about the Semantic Web layer cake.

The layer cake diagram is probably one of the most used and abused images associated with the Semantic Web vision. In the diagram, each technology or concept in the Semantic Web stack is a layer, with Crypto providing some kind of irregular icing down one side. I’d like to propose a different interpretation of the Semantic Web as a cake.

In my view, the technologies aren’t layers in the finished cake, they’re the raw ingredients that must be mixed and baked to make the cake that is the Semantic Web itself. URIs are the grains of flour; an ingredient that is essential but by itself rather bland, and lacking form and coherence. RDF triples are the egg that can bind together this URI flour. This cake is taking shape, but it’s lacking flavour. In the Semantic Web cake these flavourings, such as FOAF, SIOC, or the Programmes Ontology, are concocted on a base of RDFS and OWL. Simple cakes based on one or two flavours can be very tasty, but for our Semantic Web cake to be truly delicious we want a wide range of flavours, with some dominating others in different parts of the cake.

Once we’ve baked our cake, by putting our RDF data online according to the Linked Data principles, we’ll probably want to decorate it. Perhaps some icing or cherries on top, in the form of inferred RDF triples, would make it even more delicious. With such an appealing data cake it’s inevitable that people will want to consume it, but we have to make sure that everyone can have a slice rather than letting a big data gluton run off with the cake and deprive everyone else of this treat. We need some sort of knife; preferably one like SPARQL, that allows people to help themselves to the parts of the cake they like best. Will the cake baked and decorated, and will all the tools in place, it’s time to invite some friends round (maybe even some agents) and start consuming.

(I see that Jim Hendler’s keynote at ESWC2009 will talk about the layer cake; I’m intrigued to see how he choses to serve up the analogy).

Linked Data? Web of Data? Semantic Web? WTF?

This post was prompted by this tweet from Tim O’Reilly

People learning about Linked Data frequently ask “what’s the relationship between Linked Data and the Semantic Web?”, which is a fair and good question. One of the responses that crops up relatively frequently is that Linked Data is just an attempt to rebrand the Semantic Web. In my experience these kind of rebranding comments come mostly from people who have a certain impression of the Semantic Web vision (which may or may not be accurate), don’t like this vision, and therefore dismiss Linked Data on this basis without actually considering what it means (i.e. a means to dismantle data silos), and without necessarily rethinking their original view of the Semantic Web concept. I prefer to see it this way…

Think about HTML documents; when people started weaving these together with hyperlinks we got a Web of documents. Now think about data. When people started weaving individual bits of data together with RDF triples (that expressed the relationship between these bits of data) we saw the emergence of a Web of data. Linked Data is no more complex than this – connecting related data across the Web using URIs, HTTP and RDF. Of course there are many ways to have linked data, but in common usage Linked Data refers to the principles set out by Tim Berners-Lee in 2006.

So if we link data together using Web technologies, and according to these principles, the result is a Web of data. Personally I use the term Web of data largely interchangeably with the term Semantic Web, although not everyone in the Semantic Web world would agree with this. The precise term I use depends on the audience. With Semantic Web geeks I say Semantic Web, with others I tend to say Web of data – it’s not about rebranding, it’s about using terms that make sense to your audience, and Web of data speaks to people much more clearly than Semantic Web. Similarly, Linked Data isn’t about rebranding the Semantic Web, it’s about clarifying its fundamentals.

Tim Berners-Lee said several times last year, in public, that “Linked Data is the Semantic Web done right” (e.g. see these slides from Linked Data Planet in New York), and who am I to argue, it’s his vision. But to see this as a recent trend or a u-turn ignores the historical context. On page 191 of my copy of Weaving the Web (dated 2000, ISBN-13: 9781587990182) it says:

The first step is putting data on the Web in a form that machines can naturally understand, or converting it to that form. This creates what I call a Semantic Web – a web of data that can be processed directly or indirectly by machines.

I’m not sure this quote adequately captures the importance of links in the whole picture, but no one can claim that the Web of data label is recent marketing spin invented to make the Semantic Web palatable. This was always the deal. It’s certainly how I understood the concept (and what inspired me to do a PhD in the area).

If others  have somehow diverted the Semantic Web vision down some side road since Weaving the Web was written, then that’s unfortunate. (In my experience the Linking Open Data project was an attempt to reconnect the Semantic Web community with the some of the key aspects of the original vision that were being overlooked, like having a real Web of data as the basis for research). I certainly notice plenty of unjustified attempts at present to co-opt the term Semantic Web, now that it’s no longer a dirty word, and drive it off down some dodgy alleyway. Some of these products, services or companies may be applications or services that use some semantic technology and are delivered over the Web, but that doesn’t make them Semantic Web applications, services or companies. Anything claiming the Semantic Web label needs to get its hands dirty with Linked Data somewhere along the way. That’s just how it is.

So to return to Tim O’Reilly’s tweet, he’s not far wrong about the lack of difference between Linked Data, Semantic Web and RDF (we’ll ignore the means vs end vs technology distinction), but I’d love to know who he’s quoting about the explicit rebranding.

Linked Data Tutorials at Semantic Web Austin

I spent a few days last week in Austin, Texas, running two one-day tutorials about Linked Data. Juan was a great host, and the tutorials themselves were great fun (and significantly enhanced by the post-tutorial beers supplied by the very kind folk at The Guardian).  I was incredibly impressed by the energy, enthusiasm and foresight of the members of the Semantic Web Austin interest group that Juan and John De Oliveira kick-started. The city itself has an amazing can-do attitude, and this was reflected in the diverse group of attendees at the tutorials. It was great to see so many completely new faces, and see first hand that Linked Data appeals well beyond the traditional Semantic Web community. If Juan’s energy is anything to go by I wouldn’t be at all surprised if Austin storms to the position of Semantic Web capital of the USA by the end of 2009. The slides from the tutorial are online on my site (PDF, 2.8M), and the photos of work and beer are on Flickr.

On Snake Oil

Greetings, from the shady corner of the marketplace, where dubious characters tell tales of substances with mystical properties, and push their wares on unsuspecting passersby…

Today I had the dubious privilege of being branded a snake oil salesman, on the grounds that my “boosting” of the Semantic Web isn’t backed up by adequate eating of my own Semantic Web dog food. Apparently neither my publications page, or any of my other pages on my site, have any “intelligent content tagging”, whatever that is (I assume this means RDFa).

If it does mean RDFa, then true, but this does completely overlook the RDF/XML on my site, which as a whole is built according to Linked Data principles. More galling is that the claim completely overlooks the work I’ve done in the Semantic Web community that kick-started a lot of the ongoing dog food activity at ESWC and ISWC  (this was not a lone effort by any means: Knud Moeller, Sean Bechhofer, Chris Bizer, Richard Cyganiak and many, many others deserve as much or more credit as I do for ensuring it continued, as will Michael Hausenblas and Harith Alani at ESWC2009). Just to rub salt into the wound, these inaccuracies are being propagated across the Web in other people’s blog comments, e.g. here.

I’d like to end with some insightful meta analysis or reflection on this, but unfortunately I need to get ready for a trip to the US to run a workshop on Visual Interfaces to the Social and Semantic Web and give two days worth of Linked Data tutorials. Hope I don’t get stopped at US customs with that consignment of snake oil ;) So, no great insights for now, just a copy of my response in case it ever disappears from the original site in a puff of smoke.

——–

Seth,

I would be the first to agree with you that the Semantic Web community has not always eaten its own dogfood to the extent that it should have. It was for exactly this reason that in 2006 I produced RDF descriptions of almost all aspects of the European Semantic Web Conference (http://www.eswc2006.org/rdf/) and coordinated the deployment of numerous Semantic Web technologies at the conference (http://www.eswc2006.org/technologies/). My aim was to learn about deploying these technologies in the wild, and feed back my findings (positive or negative) to the community. The results of my evaluation were published here: http://swui.semanticweb.org/swui06/papers/Heath/Heath.pdf

Regarding the production of RDF to describe Semantic Web conferences, there had been some small efforts in this direction at previous events, but nothing comprehensive. ESWC2006 changed that for good, and there have been RDF descriptions of all European and International Semantic Web conferences published ever since. This data has been published using an ontology that derives largely from the one I created for ESWC2006, with significant contributions along the way from others. There is now a regular position on the organising committee of these conferences for people charged with coordinating this effort for the event. Knud Moeller and I shared this role at ISWC2007, where we also reported back to the community on our efforts up to that point: http://iswc2007.semanticweb.org/papers/795.pdf. Many other people have contributed significantly along the way, and this combined effort has produced the repository of data at http://data.semanticweb.org/ to which RDF descriptions of ESWC2009 will also be added.

But as you point out, the institutions that promote the Semantic Web also need to put their money where their mouth is. Agreed. While I was a PhD student at The Open University’s Knowledge Media Institute, I argued for developer time to add RDF descriptions about all KMi members to the institute’s People pages (http://kmi.open.ac.uk/people/), and tutored the developers in how to apply their existing Web development skills to exposing Semantic Markup.

My PhD work included development of the reviewing and rating site Revyu.com (http://revyu.com/), which won first prize in the 2007 Semantic Web Challenge. I can’t speak for the judges, but my hunch is that a major factor in Revyu’s success in the Challenge stemmed from its strict adherence to the Linked Data principles (http://www.w3.org/DesignIssues/LinkedData.html), which have done so much to help people make the Semantic Web a reality. Revyu publishes human-readable (i.e. HTML) and machine-readable (i.e. RDF) content side by side, but humans won’t see this RDF (I assume this is what you mean by “intelligent content tagging”) unless they know where to look; this is the intended behaviour, and works according to the techniques described in the How to Publish Linked Data on the Web tutorial (http://linkeddata.org/docs/how-to-publish) that I co-authored with Chris Bizer and Richard Cyganiak.

My personal Web site follows the same principles and uses the same techniques. If you view the source of my homepage you will see a link tag in the header that looks like this:
<link rel=”meta” type=”application/rdf+xml” title=”RDF” href=”http://tomheath.com/home/rdf” />
This is the link that tells Semantic Web crawlers to look elsewhere for the semantic markup on my site, not in the human-readable HTML page where it might get broken if I tweak the layout. If we ever meet in person I will give you one of my business cards, which doesn’t give the address of my homepage – it gives my Web URI (http://tomheath.com/id/me); humans and machines can look up this URI and retrieve information about me in a form that suits them (i.e. HTML or RDF), and follow links in that HTML or RDF to other related information. In the words of Tim Berners-Lee, this setup is “the Semantic Web done right, and the Web done right”.

Yes, it’s unfortunate that my publications page doesn’t have an RDF equivalent; perhaps I’ve been too busy investing time and energy in initiatives that will have an impact beyond the scope of my own Web site? But either way, your comment that “nor any of his other pages (that I saw) uses any form of intelligent content tagging” just doesn’t stack up. Before you make these sorts of claims I would ask, reasonably and politely, that you show due diligence in looking thoroughly, and in the right places, for the semantic markup on my site. For anyone who is in any doubt that it’s there, click on the small “RDF META” tile on the right hand side of pages on my site.

I think it’s also reasonable to expect, if you’re truly interested in how the Semantic Web community is tackling this issue, that I might be given the chance to respond to your queries in advance of this article going out, as Adrian Paschke and Alexander Wahler were. I can only hope that this response helps provide a fuller picture of the situation, with respect to my efforts and those of the community at large.

Lastly, a technical point. We need to remember that the Semantic Web allows anyone to say anything, anywhere (I’m borrowing from Dean Allemang and Jim Hendler here). So, while RDF data about my publications may not be available on my own site yet, you can find pieces of the jigsaw at data.semanticweb.org, and if all conferences and journals published their proceedings/tables of contents in RDF, then my job would simply be to join the pieces together, and I wouldn’t be faced with manually updating my list of publications. OK, so we’re not there quite yet. Yes, there’s work to be done, but we’re trying.

Tom.
——–


Raw Data Now Dot Com

Last summer I registered the domain name rawdatanow.com. I was at Linked Data Planet in New York, listening to TimBL give his keynote, and was struck by his rallying cry for RAW DATA NOW!! The idea made perfect sense: concentrate on getting your data out (soundtrack by danja), and worry about the shiny interface later; or better still, by publishing the data according to Web standards and Linked Data principles you empower others to create the shiny interfaces that are meaningful and useful to them.

According to TimBL’s slides from TED the meme comes from this post from Rufus Pollock, but the clarity of the call was new to me (update: according to delicious I bookmarked Rufus’s post on 14th Nov 2007; obviously wasn’t paying attention properly) and it encouraged me to start the Linked Data Shopping List (which could do with some more attention). I also had in mind a grassroots campaign to promote the idea (full page ads in national newspapers signed by as many people as we could get, that sort of thing), which is why I bought the domain name. But it’s been a busy year, and between the ongoing efforts to sustain and increase the momentum of the Linked Data movement, and trying to get VoCamp off the ground, I’ve had no spare time to devote to more community activities.

So, here’s my offer to the community: if you have a compelling story to tell about how we can get encourage huge numbers of organisations and individuals to provide RAW DATA NOW, and being able to use rawdatanow.com would help with that, please let me know by blog comment, email (firstname.surname@gmail.com) or identi.ca.

Update (2009-03-13): the domain http://rawdatanow.com/ now points to TimBL’s talk from TED2009, but the offer still stands.

Making My Placard: UK Email Retention Plans

I’ve nothing much to add to this story -  The UK government’s plans to retain email data and rate online content will cost too much, destroy business, liberty and must be stopped – start making placards – except my support for any peaceful demonstrations that get planned. Just posting it here in case even one additional person finds out about the plans. More also from the BBC: http://news.bbc.co.uk/1/hi/uk/7819230.stm. Off to think of a whitty slogan for my placard.

New Year, New Blog

I got a bit bored of the limited functionality of my old blog at http://my.opera.com/tomheath/blog/ — that platform just wasn’t keeping pace with the state of the art in blogging software, which is a shame, as I specifically started using it because of the FOAF output — so have decided to switch to a WordPress install here at tomheath.com. I’ve managed to import the 20 most recent posts from the old blog using the WordPress import tool for RSS (hooray for data in reusable formats) but not the earlier ones as there is no RSS feed for those (boo to that needless limitation). There’s still some cleanup to do on the imported posts, some general housekeeping tasks and a decent theme to choose, which will be a good displacement activity for me over the next few weeks. Any suggestions for essential WP plugins to install would be welcomed.

Thesis Posted Online

I’ve finally posted the final version of my PhD thesis (“Information-seeking on the Web with Trusted Social Networks – from Theory to Systems”) online at http://tomheath.com/thesis/heath-thesis-information-seeking-trusted-social-networks.pdf. I also intend to produce a compact formatted version for people who would rather print it than read the PDF. More on that later.

Yes, the Semantic Web does matter, and RDF is a key part of that picture

Paul Miller has a nice new post over at ZDnet, entitled Does the Semantic Web matter? He ultimately concludes ‘yes’, and I agree, but some of the details raised an eyebrow for me.

“Continuing landgrabs by startups that seek to attract, trap and exploit eyeballs stand unashamedly on the shoulders of Semantic Web promise whilst running counter to its basic tenets of linking and openness. On the other hand, companies ‘just’ doing perfectly reasonable – and valuable – things with the meanings of words, phrases and documents latch on to the Semantic Web’s buzz, whilst being all about Semantics and not at all about the Web.”

I have to agree, almost violently, with both these points.

One passage I can’t agree with however is this:

“The speed with which ‘RDF’ or ‘OWL’ enter any conversation about the Semantic Web is worrying; and must ultimately prove self-defeating as potential adopters retreat from a barrage of terminology and an opaque glut of unnecessary detail.”

This may be a fair criticism with regard to OWL, but saying this with regard to RDF is like criticising discussions of the Web in the early 90s for quickly coming down to details of HTML. Yes, we need to focus on what we can do with the technology, but lets not kick back too hard against discussing the technical details.

URIs and the RDF data model are exactly what enables the Semantic Web “proper” to address the issue of linking that Paul rightly criticises many startups for not properly addressing. We can’t hope to understand or predict the emergent properties of a Semantic Web without understanding the fundamental components of that Web, and right now RDF is about as fundamental as the components come.