Businesses Have Business Models, Dogs Don’t

You don’t have to live long in the Web world before you hear someone ask “What’s the business model for x?”, where x is any new technology or trend. An example that’s been doing the rounds for eternity is “What’s the business model for the Semantic Web?”. We saw a variation on this recently following the announcement that we (as in Talis) were winding down the Kasabi data marketplace and our generic Linked Data/Semantic Web consulting business. This prompted tweets along the lines of “just because Talis didn’t find the right business model for the Semantic Web doesn’t mean there isn’t one”.

While I sympathise with the sentiment, I find this kind of analysis, and the never-ending “what’s the business model for x?” questions, rather frustrating, for two reasons:

  1. The question is meaningless when it focuses on the wrong unit of analysis . Businesses have business models — arbitrary technologies, trends or concepts do not. Asking “What is the business model for the Semantic Web?” is like asking “What is the business model for football?” or “What is the business model for dogs?”.
    Football is just a game — it doesn’t have a business model! Sure, clubs and professional associations do, as will the business empires of successful players with lucrative pay and sponsorship deals — the crucial factor here is that they’re all businesses. Similarly, dogs don’t have a business model. Yes, you could breed dogs or feed dogs and choose to base a business on either, but there is no business model around the species itself. Just contemplating the question sounds ridiculous.
  2. Asking the question reveals a certain naivete about the relationship between technology and business, and I think that’s bad for us as a community. As Jeni Tennison’s recent post on Open Data Business Models highlights, novel business models are rare — the fundamentals of making money don’t change very often (in Web-time at least). Jeni focuses her analysis on Open Data publishers (rather than Open Data in general, which is good), but lists various options for revenue streams rather than business models per se.

If we’re serious about building businesses that exploit new technologies then our discussion of business models needs both the right unit of analysis (the business, not the technology) and the right depth of analysis (the broader model, not just the revenue streams). If we don’t want to engage in this degree of analysis then let’s ask a simpler, and probably more honest, question: “How can I/you/anyone make money from x?”.

Bebo White Reviews the Linked Data Book for Journal of Web Engineering

I recently had an email giving advance notice that a review of the Linked Data Book (aka “Linked Data: Evolving the Web into a Global Data Space“) would appear in Volume 11(2) of the Journal of Web Engineering, published by Rinton Press (ISSN: 1540-9589). As some people won’t have easy access to the journal, the review is republished here, with permission. It’s by Bebo White of Stanford University and beyond — thank you Bebo for the thoughtful review, and to Rinton Press for allowing it to be republished here.

Web Engineering has been described as encompassing those “technologies, methodologies, tools, and techniques used to develop and maintain Web-based applications leading to better systems, [thus to] enabling and improving the dissemination and use of content and services though the Web.” (Source: International Conference on Web Engineering)

An especially interesting aspect of this description is “dissemination and use of content.” Semantic Web technologies and particularly the Linked Data paradigm have evolved as powerful enablers for the transition of the current document-oriented Web into a Web of interlinked data/content and, ultimately, into the Semantic Web.

To facilitate this transition many aspects of distributed data and information management need to be adapted, advanced and integrated. Of particular importance are approaches for (1) extracting semantics from unstructured, semi-structured and existing structured sources, (2) management of large volumes of RDF data, (3) techniques for efficient automatic and semi-automatic data linking, (4) algorithms, tools, and inference techniques for repairing and enriching Linked Data with conceptual knowledge, (5) the collaborative authoring and creation of data on the Web, (6) the establishment of trust by preserving provenance and tracing lineage, (7) user-friendly means for browsing, exploration and search of large, federated Linked Data spaces. Particularly promising might be the synergistic combination of approaches and techniques touching upon several of these aspects at once.

For Web Engineering practitioners interested in being a part of this Web transition, Linked Data – Evolving the Web into a Global Data Space by Heath and Bizer will provide a valuable resource. The authors have done an excellent job of addressing the subject in a logical sequence of well-written chapters reflecting technical fundamentals, coverage of existing applications and tools, and the challenges for future development and research. The seven important approaches mentioned earlier are described in a consistent way and illustrated by means of a hypothetical scenario that evolves over the course of the book. The size of this book (122 pages) is deceiving in that it does not reflect the quality and density of its content. The authors have succeeded in presenting a complex topic both succinctly and clearly. It is not a “quick read,” but rather a volume to be used for references, definitions, and meaningful and instructive code examples.

This book is available in digital format (PDF). It is the first in a planned series of books/lectures. The quality of this book should make the reader/practitioner look forward to the upcoming series volumes that promise to further explain the exciting future of this topic.

Back Online after the Spam-fest

Just a quick post now this blog is back online after being badly compromised by spammers. I took everything down and let the links 404 for a while in the hope that it would encourage search engines to clear out their indexes, and the search engine referrals seems to be getting cleaner now, which is a relief. May this be the last of it.

The Linked Data Book: Draft Table of Contents

Update 2011-02-25: the book is now published and available for download and in hard copy:

Original Post

Chris Bizer and I have been working over the last few months on a book capturing the state of the art in Linked Data. The book will be published shortly as an e-book and in hard copy by Morgan & Claypool, as part of the series Synthesis Lectures in Web Engineering, edited by Jim Hendler and Frank van Harmelen. There will also be an HTML version available free of charge on the Web.

I’ve been asked about the contents, so thought I’d reproduce the table of contents here. This is the structure as we sent it to the publisher — the final structure my vary a little but changes will likely be superficial. Register at Amazon to receive an update when the book is released.

  • Overview
  • Contents
  • List of Figures
  • Acknowledgements
  • Introduction
    • The Data Deluge
    • The Rationale for Linked Data
      • Structure Enables Sophisticated Processing
      • Hyperlinks Connect Distributed Data
    • From Data Islands to a Global Data Space
    • Structure of this book
    • Intended Audience
    • Introducing Big Lynx Productions
  • Principles of Linked Data
    • The Principles in a Nutshell
    • Naming Things with URIs
    • Making URIs Defererencable
      • URIs
      • Hash URIs
      • Hash versus
    • Providing Useful RDF Information
      • The RDF Data Model
        • Benefits of using the RDF Data Model in the Linked Data Context
        • RDF Features Best Avoided in the Linked Data Context
      • RDF Serialization Formats
        • RDF/XML
        • RDFa
        • Turtle
        • N-Triples
        • RDF/JSON
    • Including Links to other Things
      • Relationship Links
      • Identity Links
      • Vocabulary Links
    • Conclusions
  • The Web of Data
    • Bootstrapping the Web of Data
    • Topology of the Web of Data
      • Cross-Domain Data
      • Geographic Data
      • Media
      • Government Data
      • Libraries and Education
      • Life Sciences
      • Retail and Commerce
      • User Generated Content and Social Media
    • Conclusions
  • Linked Data Design Considerations
    • Using URIs as Names for Things
      • Minting HTTP URIs
      • Guidelines for Creating Cool URIs
        • Keep out of namespaces you do not control
        • Abstract away from implementation details
        • Use Natural Keys within URIs
      • Example URIs
    • Describing Things with RDF
      • Literal Triples and Outgoing Links
      • Incoming Links
      • Triples that Describe Related Resources
      • Triples that Describe the Description
    • Publishing Data about Data
      • Describing a Data Set
        • Semantic Sitemaps
        • voiD Descriptions
      • Provenance Metadata
      • Licenses, Waivers and Norms for Data
        • Licenses vs. Waivers
        • Applying Licenses to Copyrightable Material
        • Non-copyrightable Material
    • Choosing and Using Vocabularies
      • SKOS, RDFS and OWL
      • RDFS Basics
        • Annotations in RDFS
        • Relating Classes and Properties
      • A Little OWL
      • Reusing Existing Terms
      • Selecting Vocabularies
      • Defining Terms
    • Making Links with RDF
      • Making Links within a Data Set
        • Publishing Incoming and Outgoing Links
      • Making Links with External Data Sources
        • Choosing External Linking Targets
        • Choosing Predicates for Linking
      • Setting RDF Links Manually
      • Auto-generating RDF Links
        • Key-based Approaches
        • Similarity-based Approaches
  • Recipes for Publishing Linked Data
    • Linked Data Publishing Patterns
      • Patterns in a Nutshell
        • From Queryable Structured Data to Linked Data
        • From Static Structured Data to Linked Data
        • From Text Documents to Linked Data
      • Additional Considerations
        • Data Volume: How much data needs to be served?
        • Data Dynamism: How often does the data change?
    • The Recipes
      • Serving Linked Data as Static RDF/XML Files
        • Hosting and Naming Static RDF Files
        • Server-Side Configuration: MIME Types
        • Making RDF Discoverable from HTML
      • Serving Linked Data as RDF Embedded in HTML Files
      • Serving RDF and HTML with Custom Server-Side Scripts
      • Serving Linked Data from Relational Databases
      • Serving Linked Data from RDF Triple Stores
      • Serving RDF by Wrapping Existing Application or Web APIs
    • Additional Approaches to Publishing Linked Data
    • Testing and Debugging Linked Data
    • Linked Data Publishing Checklist
  • Consuming Linked Data
    • Deployed Linked Data Applications
      • Generic Applications
        • Linked Data Browsers
        • Linked Data Search Engines
      • Domain-specific Applications
    • Developing a Linked Data Mashup
      • Software Requirements
      • Accessing Linked Data URIs
      • Representing Data Locally using Named Graphs
      • Querying local Data with SPARQL
    • Architecture of Linked Data Applications
      • Accessing the Web of Data
      • Vocabulary Mapping
      • Identity Resolution
      • Provenance Tracking
      • Data Quality Assessment
      • Caching Web Data Locally
      • Using Web Data in the Application Context
    • Effort Distribution between Publishers, Consumers and Third Parties
  • Summary and Outlook
  • Bibliography

Arguments about HTTP 303 Considered Harmful

Ian recently published a blog post that he’d finally got around to writing, several months after a fierce internal debate at Talis about whether the Web of Data needs HTTP 303 redirects. I can top that. Ian’s post unleashed a flood of anti-303 sentiment that has prompted me to finish a blog post I started in February 2008.

Picture the scene: six geeks sit around a table in the bar of a Holiday Inn, somewhere in West London. It’s late, we’re drunk, and debating 303 redirects and the distinction between information and non-information resources. Three of the geeks exit stage left, leaving me to thrash it out with Dan and Damian. Some time shortly afterwards Dan calls me a “303 fascist”, presumably for advocating the use of HTTP 303 redirects when serving Linked Data, as per the W3C TAG’s finding on httpRange-14.

I never got to the bottom of Dan’s objection – technical? philosophical? historical? -  but there is seemingly no end to the hand-wringing that we as a community seem willing to engage in about this issue.

Ian’s post lists nine objections to the 303 redirect pattern, most of which don’t stand up to closer scrutiny. Let’s take them one at a time:

1. it requires an extra round-trip to the server for every request

For whom is this an issue? Users? Data publishers? Both?

If it’s the former then the argument doesn’t wash. Consider a typical request for a Web page. The browser requests the page, the server sends the HTML document in response. (Wait, should that be “an HTML representation of the resource denoted by the URI“, or whatever? If we want to get picky about the terminology of Web architecture then lets start with the resource/representation minefield. I would bet hard cash that the typical Web user or developer is better able to understand the distinction between information resources and non-information resources than between resources and representations of resources).

Anyway, back to our typical request for a Web page… The browser parses the HTML document, finds references to images and stylesheets hosted on the same server, and quite likely some JavaScript hosted elsewhere. Each of the former requires another request to the original server, while the latter triggers requests to other domains. In the worst case scenario these other domains aren’t in the client’s (or their ISP’s) DNS cache, thereby requiring a DNS lookup on the hostname and increasing the overall time cost of the request.

In this context, is a 303 redirect and the resulting round-trip really an issue for users of the HTML interfaces to Linked Data applications? I doubt it.

Perhaps it’s an issue for data publishers. Perhaps those serving (or planning to serve) Linked Data are worried about whether their Web servers can handle the extra requests/second that 303s entail. If that’s the case, presumably the same data publishers insist that their development teams chuck all their CSS into a single stylesheet, in order to prevent any unnecessary requests stemming from using multiple stylesheets per HTML document. I doubt it.

My take home message is this: in the grand scheme of things, the extra round-trip stemming from a 303 redirect is of very little significance to users or data publishers. Eyal Oren raised the very valid question some time ago of whether 303s should be cached. Redefining this in the HTTP spec seems eminently sensible. So why hasn’t it happened? If just a fraction of the time spent debating 303s and IR vs. NIR was spent lobbying to get that change made then we would have some progress to report. Instead we just have hand-wringing and FUD for potential Linked Data adopters.

2. only one description can be linked from the toucan’s URI

Do people actually want to link to more than one description of a resource? Perhaps there are multiple documents on a site that describe the same thing, and it would be useful to point to them both. (Wait, we have a mechanism for that! It’s called a hypertext/hyperdata link). But maybe someone has two static files on the same server that are both equally valid descriptions of the same resource. Yes, in that case it would be useful to be able to point to both; so just create an RDF document that sits behind a 303 redirect and contains some rdfs:seeAlso statements to the more extensive description, or serve up your data from an RDF store that can pull out all statements describing the resource, and return them as one document.

I don’t buy the idea that people actually want to point to multiple descriptions apart from in the data itself. If there are other equivalent resources out there on the Web then state their equivalence, don’t just link to their descriptions. There may be 10 or 100 or 1000 equivalent resources referenced in an RDF document. 303 redirects make it very clear which is the authoritative description of a specific resource.

3. the user enters one URI into their browser and ends up at a different one, causing confusion when they want to reuse the URI of the toucan. Often they use the document URI by mistake.

OK, let’s break this issue down into two distinct scenarios. Job Public who wants to bookmark something, and Joe Developer who wants to hand-craft some RDF (using the correct URI to identify the toucan).

Again, I would bet hard cash that Joe Public doesn’t want to reuse the URI of the toucan in his bookmarks, emails, tweets etc. I would bet that he wants to reuse the URI of the document describing the toucan. No one sends emails saying “hey, read this toucan“. People say “hey, read this document about a toucan“. In this case it doesn’t matter one bit that the document URI is being used.

Things can get a bit more complicated in the Joe Developer scenario, and the awful URI pattern used in DBpedia, where it’s visually hard to notice the change from /resource to /data or /page, doesn’t help at all. So change it. Or agree to never use that pattern again. If documents describing things in DBpedia ended .rdf or .html would we even be having this debate?

Joe Developer also has to take a bit of responsibility for writing sensible RDF statements. Unfortunately, people like Ed seeming to conflate himself and his homepage (and his router and its admin console) don’t help with the general level of understanding. I’ve tried many times to explain to someone that I am not my homepage, and as far as I know I’ve never failed. In all this frantic debate about the 303 mechanism, let’s not abandon certain basic principles that just make sense.

I don’t think Ian was suggesting in his posts that he is his homepage, so let’s be very, very explicit about what we’re debating here — 303 redirects — and not muddy the waters by bringing other topics into the discussion.

4. its non-trivial to configure a web server to issue the correct redirect and only to do so for the things that are not information resources.

Ian claims this is non-trivial. Nor is running a Drupal installation. I know, it powers linkeddata.org, and maintaining it is a PITA. That doesn’t stop thousands of people doing it. Let’s be honest, very little in Web technology is trivial. Running a Web server in your basement isn’t trivial – that’s why people created wordpress.com, Flickr, MySpace, etc., bringing Web publishing to the masses, and why most of us would rather pay Web hosting companies to do the hard stuff for us. If people really see this configuration issue as a barrier then they should get on with implementing a server that makes it trivial, or teach people how to make the necessary configuration changes.

5. the server operator has to decide which resources are information resources and which are not without any precise guidance on how to distinguish the two (the official definition speaks of things whose “essential characteristics can be conveyed in a message”). I enumerate some examples here but it’s easy to get to the absurd.

The original guidance from the TAG stated that a 200 indicated an information resource, whereas a 303 could indicate any type of resource. If in doubt, use a 303 and redirect to a description of the resource. Simple.

6. it cannot be implemented using a static web server setup, i.e. one that serves static RDF documents

In this case hash URIs are more suitable anyway. This has always been the case.

7. it mixes layers of responsibility – there is information a user cannot know without making a network request and inspecting the metadata about the response to that request. When the web server ceases to exist then that information is lost.

Can’t this be resolved by adding additional triples to the document that describes the resource, stating the relationship between a resource and its description?

8. the 303 response can really only be used with things that aren’t information resources. You can’t serve up an information resource (such as a spreadsheet) and 303 redirect to metadata about the spreadsheet at the same time.

Metadata about an RDF document can be included in the document itself. Perhaps a more Web-friendly alternative to Excel could allow for richer embeddable metadata.

9. having to explain the reasoning behind using 303 redirects to mainstream web developers simply reinforces the perception that the semantic web is baroque and irrelevant to their needs.

I fail to see how Ian’s proposal, when taken as a whole package, is any less confusing.

~~~

Having written this post I’m wondering whether the time would have been better spent on something more productive, which is precisely how I feel about the topic in general. As geeks I think we love obsessing about getting things “right”, but at what cost? Ian’s main objection seems to be about the barriers we put in the way of Linked Data adoption. From my own experience there is no better barrier than uncertainty. Arguments about HTTP 303s are far more harmful than 303s themselves. Let’s put the niggles aside and get on with making Linked Data the great success we all want it to be.

Why Carry the Cost of Linked Data?

In his ongoing series of niggles about Linked Data, Rob McKinnon claims that “mandating RDF [for publication of government data] may be premature and costly“. The claim is made in reference to Francis Maude’s parliamentary answer to a question from Tom Watson. Personally I see nothing in the statement from Francis Maude that implies the mandating of RDF or Linked Data, only that “Where possible we will use recognised open standards including Linked Data standards”. Note the “where possible”. However, that’s not the point of this post.

There’s nothing premature about publishing government data as Linked Data – it’s happening on a large scale in the UK, US and elsewhere. Where I do agree with Rob (perhaps for the first time ;)) is that it comes at a cost. However, this isn’t the interesting question, as the same applies to any investment in a nation’s infrastructure. The interesting questions are who bears that cost, and who benefits?

Let’s make a direct comparison between publishing a data set in raw CSV format (probably exported from a database or spreadsheet) and making the extra effort to publish it in RDF according to the Linked Data principles.

Assuming that your spreadsheet doesn’t contain formulas or merged cells that would make the data irregularly shaped, or that you can create a nice database view that denormalises your relational database tables into one, then the cost of publishing data in CSV basically amounts to running the appropriate export of the data and hosting the static file somewhere on the Web. Dead cheap, right?

Oh wait, you’ll need to write some documentation explaining what each of the columns in the CSV file mean, and what types of data people should expect to find in each of these. You’ll also need to create and maintain some kind of directory so people can discover your data in the crazy haystack that is the Web. Not quite so cheap after all.

So what are the comparable processes and costs in the RDF and Linked Data scenario? One option is to use a tool like D2R Server to expose data from your relational database to the Web as RDF, but let’s stick with the CSV example to demonstrate the lo-fi approach.

This is not the place to reproduce an entire guide to publishing Linked Data, but in a nutshell, you’ll need to decide on the format of the URIs you’ll assign to the things described in your data set, select one or more RDF schemata with which to describe your data (analogous to defining what the columns in your CSV file mean and how their contents relate to each other), and then write some code to convert the data in your CSV file to RDF, according to your URI format and the chosen schemata. Last of all, for it to be proper Linked Data, you’ll need to find a related Linked Data set on the Web and create some RDF that links (some of) the things in your data set to things in the other. Just as with conventional Web sites, if people find your data useful or interesting they’ll create some RDF that links the things in their data to the things in yours, gradually creating an unbounded Web of data.

Clearly these extra steps come at a cost compared to publishing raw CSV files. So why bear these costs?

There are two main reasons: discoverability and reusability.

Anyone (deliberately) publishing data on the Web presumably does so because they want other people to be able to find and reuse that data. The beauty of Linked Data is that discoverability is baked in to the combination of RDF and the Linked Data principles. Incoming links to an RDF data set put that data set “into the Web” and outgoing links increase the interconnectivity further.

Yes, you can create an HTML link to a CSV file, but you can’t link to specific things described in the data or say how they relate to each other. Linked Data enables this. Yes, you can publish some documentation alongside a CSV file explaining what each of the columns mean, but that description can’t be interlinked with the data itself, making it self-describing. Linked Data does this. Yes, you can include URIs in the data itself, but CSV provides no mechanism that for indicating that the content of a particular cell is a link to be followed. Linked Data does this. Yes, you can create directories or catalogues that describe the data sets available from a particular publisher, but this doesn’t scale to the Web. Remember what the arrival of Google did to the Yahoo! directory? What we need is a mechanism that supports arbitrary discovery of data sets by bots roaming the Web and building searchable indices of the data they find. Linked Data enables this.

Assuming that a particular data set has been discovered, what is the cost of any one party using that data in a new application? Perhaps this application only needs one data set, in which case all the developer must do is read the documentation to understand the structure of the data and get on with writing code. A much more likely scenario is that the application requires integration of two or more data sets. If each of these data sets is just a CSV file then every application developer must incur the cost of integrating them, i.e. linking together the elements common to both data sets, and must do this for every new data set they want to use in their application. In this scenario the integration cost of using these data sets is proportional to their use. There are no economies of scale. It always costs the same amount, to every consumer.

Not so with Linked Data, which enables the data publisher to identify links between their data and third party data sets, and make these links available to every consumer of that data set by publishing them as RDF along with the data itself. Yes, there is a one-off cost to the publisher in creating the links that are most likely to be useful to data consumers, but that’s a one-off. It doesn’t increase every time a developer uses the data set, and each developer doesn’t have to pay that cost for each data set they use.

If data publishers are seriously interested in promoting the use of their data then this is a cost worth bearing. Why constantly reinvent the wheel by creating new sets of links for every application that uses a certain combination of data sets? Certainly as a UK taxpayer, I would rather the UK Government made this one-off investment in publishing and linking RDF data, thereby lowering the cost for everyone that wanted to use them. This is the way to build a vibrant economy around open data.

The demise of community.linkeddata.org

The issue of what happened to the community.linkeddata.org site came up in this thread on the public-lod mailing list. In the name of the public record I’m posting some of the messages I have related to this issue. I’ll try and get any gaps filled in in due course (let me know if there are specific gaps of interest to you and I’ll try to fill them in); in the meantime I’m keen to get the key bits online.

Some background is here:
http://lists.w3.org/Archives/Public/public-lod/2008Apr/0096.html



from    Michael Hausenblas <michael.hausenblas@d...>
to    Ted Thibodeau Jr <tthibodeau@o...>
cc    Kingsley Idehen <kidehen@o...>,Tom Heath <tom.heath@t...>
date    9 February 2009 18:27
subject    Re: "powered by" logos on linkeddata.org MediaWiki

MacTed,

I'll likely not invest time anymore in the Wiki [the MediaWiki instance at community.linkeddata.org - TH]. The plan is to transfer everything to Drupal. We had a lot of hassle with the Wiki configuration and community contribution was rather low. After the spam attack we decided to close it. It only contains few valuable things (glossary and iM maybe) ..

Do you have an account at linkeddata.org Drupal, yet? Otherwise, Tom, would you please be so kind?

Again, sorry for the delay ... it's LDOW-paper-write-up time :)

Cheers,
Michael


--
Dr. Michael Hausenblas
DERI - Digital Enterprise Research Institute
National University of Ireland, Lower Dangan,
Galway, Ireland, Europe
Tel. +353 91 495730



> From: Ted Thibodeau Jr <tthibodeau@o...>
> Date: Fri, 6 Feb 2009 16:22:31 -0500
> To: Michael Hausenblas <michael.hausenblas@d...>
> Cc: Kingsley Idehen <kidehen@o...>
> Subject: "powered by" logos on linkeddata.org MediaWiki
>
> Hi, Michael --
>
> re: <http://community.linkeddata.org/MediaWiki/index.php?Main_Page>
>
> It appears that the "Powered by Virtuoso" logo that was once alongside
> the
> "Powered by Mediawiki" logo (lower right of every page) has disappeared
> from the main page boilerplate.  Can that get re-added, please?
>
> Please use this logo --
>
>
> <http://boards.openlinksw.com/support/styles/prosilver/theme/images/virt_power
> _no_border.png
>>
>
> -- and make it href link to --
>
>     <http://virtuoso.openlinksw.com/>
>
> Please let me know if there's any difficulty with this.
>
> Thanks,
>
> Ted
>
>
> --
> A: Yes.                      http://www.guckes.net/faq/attribution.html
> | Q: Are you sure?
> | | A: Because it reverses the logical flow of conversation.
> | | | Q: Why is top posting frowned upon?
>
> Ted Thibodeau, Jr.           //               voice +1-781-273-0900 x32
> Evangelism & Support         //        mailto:tthibodeau@o...
> OpenLink Software, Inc.      //              http://www.openlinksw.com/
>                                   http://www.openlinksw.com/weblogs/uda/
> OpenLink Blogs              http://www.openlinksw.com/weblogs/virtuoso/
>                                 http://www.openlinksw.com/blog/~kidehen/
>      Universal Data Access and Virtual Database Technology Providers




from Tom Heath
to Michael Hausenblas
date 9 March 2009 13:49
subject Re: http://linkeddata.org/domains?
mailed-by talisplatform.com

Hey Michael,

Re 2. great! I've created this node
and put it near the top of the primary navigation. You should be able
to write to that at will :)

Re 1. yes, good idea. I agree we should do this, just need to think
through the IA a little. Can you give me a day or so to chew this
over?

Cheers :)

Tom.




2009/3/7 Michael Hausenblas :
> Tom,
>
> As you may have gathered we're about to close down the 'old' community Wiki
> [1] and move over to [2]. There is not much active (and valuable) content at
> [1] and we had a lot of troubles with spammer (oh how I hate these ...).
>
> So, basically two things would be great:
>
> 1. I'd like to propose to add a sort of 'domain' or 'community' sub-space,
> such as http://linkeddata.org/domains where I can put our interlinking
> multimedia stuff [3] (and then change the redirect ;)
>
> 2. The second thing would be to find a place at [2] for the glossary [4] -
> seems quite helpful for people.
>
> Any thoughts?
>
>
> Cheers,
> Michael
>
> [1] http://community.linkeddata.org/MediaWiki/index.php?Main_Page
> [2] http://linkeddata.org/
> [3] http://www.interlinkingmultimedia.info/
> [4] http://community.linkeddata.org/MediaWiki/?Glossary
>
> --
> Dr. Michael Hausenblas
> DERI - Digital Enterprise Research Institute
> National University of Ireland, Lower Dangan,
> Galway, Ireland, Europe
> Tel. +353 91 495730
> http://sw-app.org/about.html
> http://webofdata.wordpress.com/



It’s quite hard to follow the indenting in the mail exchange below, so I’ve marked my contributions in bold.


from Tom Heath
to Kingsley Idehen
cc Michael Hausenblas
date 18 June 2009 17:07
subject Re: community.linkeddata.org
mailed-by talisplatform.com

Kingsley,

Also, what news of the previous instance?

Cheers,

Tom.

2009/6/18 Tom Heath :
> Hi Kingsley,
>
> Would the service you envisage at the subdomains you propose provide
> only a URI minting plus FOAF+SSL/OpenID service, or would other stuff
> also be available at that domain? If so, what?
>
> Tom.
>
>

> 2009/6/18 Kingsley Idehen :
>> Tom Heath wrote:
>>>
>>> Hi Kingsley,
>>>
>>> 2009/6/16 Kingsley Idehen :
>>>
>>>>
>>>> Tom Heath wrote:
>>>>
>>>>>
>>>>> Hi Kingsley,
>>>>>
>>>>> 2009/6/16 Kingsley Idehen :
>>>>>
>>>>>
>>>>>>
>>>>>> Tom Heath wrote:
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Hi Kingsley,
>>>>>>>
>>>>>>> According to our earlier discussions, this subdomain is deprecated in
>>>>>>> favour of the main site at linkeddata.org. If you'd a like a different
>>>>>>> subdomain for specific service just let me know.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>>
>>>>>>>

>>>>>>
>>>>>> What are the options?
>>>>>>
>>>>>>
>>>>>
>>>>> Guess that depends on the service you have in mind :) My goal is to
>>>>> avoid fragmentation of the presence at linkeddata.org and subdomains,
>>>>> so favour only creating new subdomains that do something highly
>>>>> specific and do not duplicate functionality or content available
>>>>> elsewhere.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Tom.
>>>>>
>>>>>
>>>>>

>>>>
>>>> I am not quite understanding you.
>>>>
>>>> What would you see as the scheme for an instance of ODS that gives LOD
>>>> members URIs (of the FOAF+SSL variety)?
>>>>
>>>> Personally, I have no particular interest in pushing this with you per
>>>> se.
>>>> If you somehow deem this unimportant, no problem, I move on etc..
>>>>
>>>
>>> So the proposal is for another equivalent ODS instance, but one that
>>> adds FOAF+SSL support?
>>>
>>> If so, then this does sound important, as FOAF+SSL seems to have lots
>>> to offer. The problem I'm trying to address is as follows: the
>>> feedback I got from people about the previous offering at
>>> community.linkeddata.org was that it was confusing. People didn't
>>> understand what was going on or being offered, and the end result
>>> seemed to be further fragmentation of Linked Data coverage -
>>> particularly problematic for newbies. Therefore a very trimmed down
>>> service offering just personal URIs with FOAF+SSL support would seem
>>> to be of benefit, but I'm not sure of the value of replicating the
>>> previous offering with enhancements.
>>>
>>> Thoughts?
>>>
>>> Incidentally, the previous instance seems to have died. Can it be
>>> reinstated while we finish porting the content across?
>>>
>>> Cheers,
>>>
>>> Tom.
>>>
>>>

>>
>> Tom,
>>
>> Goal is to have a place for people to easily obtain personal URIs. In a way,
>> official LOD community Web IDs.
>> FOAF+SSL is the most important feature here and LOD should be a launch pad.
>>
>> Possible options:
>> yourid.linkeddata.org
>> webid.linkeddata.org
>> me.linkeddata.org
>>
>>
>> This is how it works:
>>
>> 1. New Users open accounts
>> 2. Edit profile
>> 3. Click a button that makes an X.509 certificate, exports to browser, and
>> writes to FOAF space
>> 4. Member visits any FOAF+SSL or OpenID space on the Web and never has to
>> present uid/pwd
>>
>> For existing members, they simply perform steps 3-4.
>>
>>
>> --
>>
>>
>> Regards,
>>
>> Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
>> President & CEO OpenLink Software Web: http://www.openlinksw.com
>>
>
> --
> Dr Tom Heath
> Researcher
> Platform Division
> Talis Information Ltd
> T: 0870 400 5000
> W: http://www.talis.com/
>
--
Dr Tom Heath
Researcher
Platform Division
Talis Information Ltd
T: 0870 400 5000
W: http://www.talis.com/



Wash down the Apple tablet with a gulp of Kool Aid

I’m not in the least bit excited about the iPad, and it seems I’m not alone. The mood seems to have changed since before the launch, with countless tech journalists previously falling over themselves to declare tablets the next big thing. (Thankfully Rory Cellan-Jones from the BBC was more measured, focusing on personal projectors as a more exciting development). The mood since is considerably more downbeat, and I think more realistic.

I may be missing some crucial usage context that reveals the killer characteristics of the iPad, but I’ve tried really hard and still nothing. There are many obvious practical issues with the device:

  • it’s too big for a pocket, but not sufficiently more useful than an iPhone or an HTC Hero.
  • it’s about the same size as a compact laptop, but with less scope for comfortable rapid input.
  • it’s probably too big to cradle comfortably in my hand for prolonged periods, and sitting with one ankle on the other knee is not always practical.

The only scenarios I can conjure up where I could imagine using the device are:

  • showing people my holiday photos.
  • reviewing design proofs without needing to print them out.

Neither of these, or even both, are very compelling at all. TVs are getting good for viewing photos, by including e.g. an SD card slot, and rumours of the death of paper are greatly exagerated.

Perhaps the most annoying thing about the scenarios used to promote the device is the one about the San Francisco to Tokyo flight, watching video all the way without running out of battery. Any airline with planes worth boarding has personal video screens. I don’t want to bring my own. I’d rather use the space to carry a decent pair of noise-canceling headphones, which I’m sure increase my enjoyment of onboard media far more than a little bit of extra screen real estate. The development I want to see is not a new device that I have to prop on the flimsy airline table, hold tight when we hit some turbulence, and stow away when my food arrives, but the capability to connect my own device to the in-built screen via USB or Bluetooth. Even a bare USB port with power but no connectivity would be a start, allowing me to run low-powered devices (that I already own) during long flights.

OK, so the flight reference is just a touchstone for how long the device can run without mains power, but I think it demonstrates a lack of grounding of the device in realistic scenarios.

Any new device has to have two key characteristics these days for me to get excited: interoperability and convergence. The iPad seems to have very little of either. You could argue that it offers some convergence between smartphones and e-readers, but that’s about as exciting as convergence between a smartphone and a wall clock.

I’m left wondering what the iPad is competing against? I’m guessing it’s paper, whether that’s in the form of a book, brochure, newspaper, restaurant menu or whatever. Unfortunately for Apple, paper is pretty well suited to each of these, especially when you introduce bath water, the risk of theft, or just ketchup, into the equation. Perhaps this is the end of electronic picture frames as dedicated device? Probably about time. Maybe the iPad will make an excellent Spotify console for the living room. Who knows? Whatever happens I can’t see this becoming a mass-market product worthy of even a fraction of the hype.

Where I wish that Apple had expended their creative talent was in addressing the power issue. Not in making sure I could watch 10 hours of back to back video, but in enabling me to spend that energy in whatever way I choose, powering whichever device I choose. It drives me crazy that I carry several batteries around, and short of running my phone off my laptop via USB there is no interoperability between these power sources. If Apple could produce a universal power supply that was sleek, sexy, efficient and interoperable, then I would be interested. Sadly this doesn’t seem to be the way.

Putting a Conference into the Semantic Web

Chris Gutteridge asked this question about semantically enabling conference Web sites, which is a subject close to my heart. It’s hard to give a meaningful response in 140 characters, so I decided to get some headline thoughts down for posterity. If you want a fuller account of some first-hand experiences, then the following papers are a good place to start:

Top Five Tips for Semantic Web-enabling a Conference

1. Exploit Existing Workflows

Conferences are incredibly data-rich, but much of this richness is bound up in systems for e.g. paper submission, delegate registration, and scheduling, that aren’t native to the Semantic Web. Recognise this in advance and plan for how you intend to get the data from these systems out into the Web. The good news is that scripts now exists to handle dumps from submission systems such as EasyChair, but you may need to ensure that the conference instance of these systems is configured correctly for your needs. For example, getting dumps from these systems often comes at a price, and if you’re using one instance per track rather than the multi-track options, you may be in for a shock when you ask for the dumps. Speak to the Programme Chairs about this as soon as possible.

In my experience, delegate registration opens months in advance of a conference and often uses a proprietary, one-off system. As early as possible make contact with the person who will be developing and/or running this system, and agree how the registration system can be extended to collect data about the delegates and their affiliations, for example. Obviously there needs to be an opt-in process before this data is published on the public Web.

Collecting these types of data from existing workflows is so monumentally easier than asking people to submit it later through some dedicated means. With this in mind, have modest expectations (in terms of degree of participation) for any system you hope to deploy for people to use before, during and after the conference, whether this is a personalised schedule planner, paper annotation system or rating system for local restaurants. People have massive demands on their time always, and especially at a conference, so any system that isn’t already part of a workflow they are engaged with is likely to get limited uptake.

2. Publish Data Early then Incrementally Improve

Perhaps your goal in publishing RDF data about your conference is simply to do the right thing by eating your own dog food and providing an archival record of the event in machine-readable form. This is fine, but ideally you want people to use the published data before and during the event, not just afterwards. In an ideal world, people will use the data you publish as a foundation for demos of their applications and services and the conference, as means to enhance the event and also to promote their own work. To maximise the chances of this happening you need to make it clear in advance that you will be publishing this data, and give an indication of what the scope of this will be. The RDF available from previous events in the ESWC and ISWC series can give an impression of the shape of the data you will publish (assuming you follow the same modelling patterns), but get samples out early and basic structures in place so people have the chance to prepare. Better to incrementally enhance something than save it all up for a big bang just one week before the conference.

3. Attend to the details

Many of the recent ESWC and ISWC events have done a great job of publishing conference data, and have certainly streamlined the process considerably. However, along the way we’ve lost (or failed to attend to) some of the small but significant facts that relate to a conference, such as the location, venue, sponsors and keynote speakers. This stuff matters, and is the kind of data that probably doesn’t get recorded elsewhere. Obviously publishing data about the conference papers is important, but from an archival point of view this information is at least recorded by the publishers of the proceedings. The more tacit, historical knowledge about a conference series may be of great interest in the future, but is at risk of slipping away.

4. Piggy-back on Existing Infrastructure

As I discovered while coordinating the Semantic Web Technologies for ESWC2006, deploying event-specific services is simply making a rod for your own back. Who is going to ensure these stay alive after the event is over and everyone moves onto the next thing? The answer is probably no-one. The domain-registration will lapse, the server will get hacked or develop a fault, the person who once knew why that site mattered will take a job elsewhere, and the data will disappear in the process. Therefore it’s critical that every event uses infrastructure that is already embedded in everyday usage and also/therefore has a future. The best example of this is data.semanticweb.org, the de facto home for Linked Data from Web-related events. This service has support from SWSA, and enough buy-in from the community, to minimise the risk that it will ever go away. By all means host the data on the conference Web site if you must, but don’t dream of not mirroring it at data.semanticweb.org, with owl:sameAs links to equivalent URIs in that namespace for all entities in your data set.

5. Put Your Data in the Web

Remember that while putting your data on the Web for others to use is a great start, it’s going to be of greatest use to people if it’s also *in* the Web. This is a frequently overlooked distinction, but it really matters. No one in their right mind would dream of having a Web site with no incoming or outgoing links, and the same applies to data. Wherever possible the entities in your data set need to be linked to related entities in other data sets. This could be as simple as linking the conference venue to the town in which it is located, where the URI for the town comes from Geonames. Linking in this way ensures that consumers of the data can discover related information, and avoids you having to publish redundant information that already exists somewhere else on the Web. The really great news is that data.semanticweb.org already provides URIs for many people who have published in the Semantic Web field, and (aside from some complexities with special characters in names) linking to these really can be achieved in one line of code. When it’s this easy there really are no excuses.

Conclusions

Reading the above points back before I hit publish, I realise they focus on Semantic Web-enabling the conference as a whole, rather than specifically the conference Web site, which was the focus of Chris’s original question. I think we know a decent amount about publishing Linked Data on the Web, so hopefully these tips usefully address the more process-oriented than technical aspects.

Search Engine Optimisation for People with a Conscience

I’ve spent a fair amount of time recently cleaning up spammy reviews on Revyu, the Linked Data/Semantic Web reviewing and rating site. The main perpetrators of these spammy reviews seem to be self-appointed Search Engine Optimisation (SEO) “experts” (who even advertise themselves as such on LinkedIn). Their main strategy appears to be polluting the Web with links to fairly worthless sites, in the hope of gaining some share of search engine traffic.

Getting a piece of the action I have no objection to per se. This was exactly my aim with chiip.co.uk my (currently somewhat on ice) shop window to Amazon – visitors could find products via search engines and, if desired, buy them through a trusted supplier, earning me enough commission on the side to pay my hosting bill for a month or two. The difference here is that I just tweaked the site layout to show off the content to search engines in its best light. I never polluted anyone else’s space to gain exposure. People that do this are getting me down.

Revyu has become somewhat popular as a target, presumably due to its decent ranking in the search engines. The site didn’t gain this position through spamming other sites with backlinks, but by having some simple principles baked into the site design from the start. They’re the same basic principles I’ve used on any site I’ve created, and have generally served me well. A few years ago I wrote down the principles that guide me, and I share this first draft here as a service to people who want to optimise the exposure of their site and still be able to sleep at night.

Before you read the tips though bear this in mind: there is something of an art to this, but it isn’t rocket science, and it certainly isn’t black magic. If you can create a Web site then you can optimise pretty well for search engines without paying a single self-appointed “expert” a single penny. This is bread and butter stuff. These approaches should be part of the core skill set of any Web developer rather than an afterthought addressed through some external process. The tips below are not guaranteed to work and may become defunct at any time (some may be defunct already – does anyone ever use frames these days?). However, follow these and you’ll be 80% of the way there.

Search Engine Optimisation Tips

  1. there’s only so much you can do, and this may change at any time
  2. don’t try and trick the search engines, just be honest
  3. use web standards and clean code
  4. use css for styling and layout
  5. put important text first in the page; let this influence your design, it’s probably what users want too, especially if they’re on non-standard browsers
  6. choose page titles carefully
  7. use meta tags, but only if they’re accurate
  8. use robot meta tags, and robots.txt
  9. use structural markup, especially headings
  10. give anchors sensible text (“click here” does not qualify as sensible)
  11. use link titles and alt text
  12. give files and folders meaningful names
  13. provide default pages in directories so people can hack your URLs
  14. forge meaningful (human) links with other sites, and make technical links accordingly
  15. encourage inward links to your site
    • make urls readable and linkable to
    • don’t break links (at least give redirects)
  16. don’t use javascript for links/popup windows that you want to be indexed
  17. avoid links embedded in flash movies
  18. never use frames
  19. never use cookies to power navigation
  20. give example searches or browse trees to open databases to search engines
  21. maximise the content richness of pages
  22. avoid leaf node pages (always create links back to the rest of the site)
  23. limit the use of PDFs
  24. take common typos into account, or spelling variations (optimisation vs optimization is a good example)
  25. update the site regularly
  26. don’t use hidden text or comments to try and convey spam words
  27. don’t embed text in images
  28. avoid writing out text using javascript
  29. don’t use browser detection to alter content or restrict access
  30. provide meaningful error pages
  31. be realistic about what you can achieve optimsation-wise
  32. establish a traffic baseline
  33. use monitoring tools to track your progress

At some point I hope to provide evidence backing up each of these claims. In the meantime you’ll just have to trust me, but it won’t cost you anything.