Archive for the 'Uncategorized' Category

Businesses Have Business Models, Dogs Don’t

You don’t have to live long in the Web world before you hear someone ask “What’s the business model for x?”, where x is any new technology or trend. An example that’s been doing the rounds for eternity is “What’s the business model for the Semantic Web?”. We saw a variation on this recently following the announcement that we (as in Talis) were winding down the Kasabi data marketplace and our generic Linked Data/Semantic Web consulting business. This prompted tweets along the lines of “just because Talis didn’t find the right business model for the Semantic Web doesn’t mean there isn’t one”.

While I sympathise with the sentiment, I find this kind of analysis, and the never-ending “what’s the business model for x?” questions, rather frustrating, for two reasons:

  1. The question is meaningless when it focuses on the wrong unit of analysis . Businesses have business models — arbitrary technologies, trends or concepts do not. Asking “What is the business model for the Semantic Web?” is like asking “What is the business model for football?” or “What is the business model for dogs?”.
    Football is just a game — it doesn’t have a business model! Sure, clubs and professional associations do, as will the business empires of successful players with lucrative pay and sponsorship deals — the crucial factor here is that they’re all businesses. Similarly, dogs don’t have a business model. Yes, you could breed dogs or feed dogs and choose to base a business on either, but there is no business model around the species itself. Just contemplating the question sounds ridiculous.
  2. Asking the question reveals a certain naivete about the relationship between technology and business, and I think that’s bad for us as a community. As Jeni Tennison’s recent post on Open Data Business Models highlights, novel business models are rare — the fundamentals of making money don’t change very often (in Web-time at least). Jeni focuses her analysis on Open Data publishers (rather than Open Data in general, which is good), but lists various options for revenue streams rather than business models per se.

If we’re serious about building businesses that exploit new technologies then our discussion of business models needs both the right unit of analysis (the business, not the technology) and the right depth of analysis (the broader model, not just the revenue streams). If we don’t want to engage in this degree of analysis then let’s ask a simpler, and probably more honest, question: “How can I/you/anyone make money from x?”.

Back Online after the Spam-fest

Just a quick post now this blog is back online after being badly compromised by spammers. I took everything down and let the links 404 for a while in the hope that it would encourage search engines to clear out their indexes, and the search engine referrals seems to be getting cleaner now, which is a relief. May this be the last of it.

The demise of

The issue of what happened to the site came up in this thread on the public-lod mailing list. In the name of the public record I’m posting some of the messages I have related to this issue. I’ll try and get any gaps filled in in due course (let me know if there are specific gaps of interest to you and I’ll try to fill them in); in the meantime I’m keen to get the key bits online.

Some background is here:

from    Michael Hausenblas <michael.hausenblas@d...>
to    Ted Thibodeau Jr <tthibodeau@o...>
cc    Kingsley Idehen <kidehen@o...>,Tom Heath <tom.heath@t...>
date    9 February 2009 18:27
subject    Re: "powered by" logos on MediaWiki


I'll likely not invest time anymore in the Wiki [the MediaWiki instance at - TH]. The plan is to transfer everything to Drupal. We had a lot of hassle with the Wiki configuration and community contribution was rather low. After the spam attack we decided to close it. It only contains few valuable things (glossary and iM maybe) ..

Do you have an account at Drupal, yet? Otherwise, Tom, would you please be so kind?

Again, sorry for the delay ... it's LDOW-paper-write-up time :)


Dr. Michael Hausenblas
DERI - Digital Enterprise Research Institute
National University of Ireland, Lower Dangan,
Galway, Ireland, Europe
Tel. +353 91 495730

> From: Ted Thibodeau Jr <tthibodeau@o...>
> Date: Fri, 6 Feb 2009 16:22:31 -0500
> To: Michael Hausenblas <michael.hausenblas@d...>
> Cc: Kingsley Idehen <kidehen@o...>
> Subject: "powered by" logos on MediaWiki
> Hi, Michael --
> re: <>
> It appears that the "Powered by Virtuoso" logo that was once alongside
> the
> "Powered by Mediawiki" logo (lower right of every page) has disappeared
> from the main page boilerplate.  Can that get re-added, please?
> Please use this logo --
> <
> _no_border.png
> -- and make it href link to --
>     <>
> Please let me know if there's any difficulty with this.
> Thanks,
> Ted
> --
> A: Yes.            
> | Q: Are you sure?
> | | A: Because it reverses the logical flow of conversation.
> | | | Q: Why is top posting frowned upon?
> Ted Thibodeau, Jr.           //               voice +1-781-273-0900 x32
> Evangelism & Support         //        mailto:tthibodeau@o...
> OpenLink Software, Inc.      //    
> OpenLink Blogs    
>      Universal Data Access and Virtual Database Technology Providers

from Tom Heath
to Michael Hausenblas
date 9 March 2009 13:49
subject Re:

Hey Michael,

Re 2. great! I've created this node
and put it near the top of the primary navigation. You should be able
to write to that at will :)

Re 1. yes, good idea. I agree we should do this, just need to think
through the IA a little. Can you give me a day or so to chew this

Cheers :)


2009/3/7 Michael Hausenblas :
> Tom,
> As you may have gathered we're about to close down the 'old' community Wiki
> [1] and move over to [2]. There is not much active (and valuable) content at
> [1] and we had a lot of troubles with spammer (oh how I hate these ...).
> So, basically two things would be great:
> 1. I'd like to propose to add a sort of 'domain' or 'community' sub-space,
> such as where I can put our interlinking
> multimedia stuff [3] (and then change the redirect ;)
> 2. The second thing would be to find a place at [2] for the glossary [4] -
> seems quite helpful for people.
> Any thoughts?
> Cheers,
> Michael
> [1]
> [2]
> [3]
> [4]
> --
> Dr. Michael Hausenblas
> DERI - Digital Enterprise Research Institute
> National University of Ireland, Lower Dangan,
> Galway, Ireland, Europe
> Tel. +353 91 495730

It’s quite hard to follow the indenting in the mail exchange below, so I’ve marked my contributions in bold.

from Tom Heath
to Kingsley Idehen
cc Michael Hausenblas
date 18 June 2009 17:07
subject Re:


Also, what news of the previous instance?



2009/6/18 Tom Heath :
> Hi Kingsley,
> Would the service you envisage at the subdomains you propose provide
> only a URI minting plus FOAF+SSL/OpenID service, or would other stuff
> also be available at that domain? If so, what?
> Tom.

> 2009/6/18 Kingsley Idehen :
>> Tom Heath wrote:
>>> Hi Kingsley,
>>> 2009/6/16 Kingsley Idehen :
>>>> Tom Heath wrote:
>>>>> Hi Kingsley,
>>>>> 2009/6/16 Kingsley Idehen :
>>>>>> Tom Heath wrote:
>>>>>>> Hi Kingsley,
>>>>>>> According to our earlier discussions, this subdomain is deprecated in
>>>>>>> favour of the main site at If you'd a like a different
>>>>>>> subdomain for specific service just let me know.
>>>>>>> Cheers,

>>>>>> What are the options?
>>>>> Guess that depends on the service you have in mind :) My goal is to
>>>>> avoid fragmentation of the presence at and subdomains,
>>>>> so favour only creating new subdomains that do something highly
>>>>> specific and do not duplicate functionality or content available
>>>>> elsewhere.
>>>>> Cheers,
>>>>> Tom.

>>>> I am not quite understanding you.
>>>> What would you see as the scheme for an instance of ODS that gives LOD
>>>> members URIs (of the FOAF+SSL variety)?
>>>> Personally, I have no particular interest in pushing this with you per
>>>> se.
>>>> If you somehow deem this unimportant, no problem, I move on etc..
>>> So the proposal is for another equivalent ODS instance, but one that
>>> adds FOAF+SSL support?
>>> If so, then this does sound important, as FOAF+SSL seems to have lots
>>> to offer. The problem I'm trying to address is as follows: the
>>> feedback I got from people about the previous offering at
>>> was that it was confusing. People didn't
>>> understand what was going on or being offered, and the end result
>>> seemed to be further fragmentation of Linked Data coverage -
>>> particularly problematic for newbies. Therefore a very trimmed down
>>> service offering just personal URIs with FOAF+SSL support would seem
>>> to be of benefit, but I'm not sure of the value of replicating the
>>> previous offering with enhancements.
>>> Thoughts?
>>> Incidentally, the previous instance seems to have died. Can it be
>>> reinstated while we finish porting the content across?
>>> Cheers,
>>> Tom.

>> Tom,
>> Goal is to have a place for people to easily obtain personal URIs. In a way,
>> official LOD community Web IDs.
>> FOAF+SSL is the most important feature here and LOD should be a launch pad.
>> Possible options:
>> This is how it works:
>> 1. New Users open accounts
>> 2. Edit profile
>> 3. Click a button that makes an X.509 certificate, exports to browser, and
>> writes to FOAF space
>> 4. Member visits any FOAF+SSL or OpenID space on the Web and never has to
>> present uid/pwd
>> For existing members, they simply perform steps 3-4.
>> --
>> Regards,
>> Kingsley Idehen Weblog:
>> President & CEO OpenLink Software Web:
> --
> Dr Tom Heath
> Researcher
> Platform Division
> Talis Information Ltd
> T: 0870 400 5000
> W:
Dr Tom Heath
Platform Division
Talis Information Ltd
T: 0870 400 5000

Search Engine Optimisation for People with a Conscience

I’ve spent a fair amount of time recently cleaning up spammy reviews on Revyu, the Linked Data/Semantic Web reviewing and rating site. The main perpetrators of these spammy reviews seem to be self-appointed Search Engine Optimisation (SEO) “experts” (who even advertise themselves as such on LinkedIn). Their main strategy appears to be polluting the Web with links to fairly worthless sites, in the hope of gaining some share of search engine traffic.

Getting a piece of the action I have no objection to per se. This was exactly my aim with my (currently somewhat on ice) shop window to Amazon – visitors could find products via search engines and, if desired, buy them through a trusted supplier, earning me enough commission on the side to pay my hosting bill for a month or two. The difference here is that I just tweaked the site layout to show off the content to search engines in its best light. I never polluted anyone else’s space to gain exposure. People that do this are getting me down.

Revyu has become somewhat popular as a target, presumably due to its decent ranking in the search engines. The site didn’t gain this position through spamming other sites with backlinks, but by having some simple principles baked into the site design from the start. They’re the same basic principles I’ve used on any site I’ve created, and have generally served me well. A few years ago I wrote down the principles that guide me, and I share this first draft here as a service to people who want to optimise the exposure of their site and still be able to sleep at night.

Before you read the tips though bear this in mind: there is something of an art to this, but it isn’t rocket science, and it certainly isn’t black magic. If you can create a Web site then you can optimise pretty well for search engines without paying a single self-appointed “expert” a single penny. This is bread and butter stuff. These approaches should be part of the core skill set of any Web developer rather than an afterthought addressed through some external process. The tips below are not guaranteed to work and may become defunct at any time (some may be defunct already – does anyone ever use frames these days?). However, follow these and you’ll be 80% of the way there.

Search Engine Optimisation Tips

  1. there’s only so much you can do, and this may change at any time
  2. don’t try and trick the search engines, just be honest
  3. use web standards and clean code
  4. use css for styling and layout
  5. put important text first in the page; let this influence your design, it’s probably what users want too, especially if they’re on non-standard browsers
  6. choose page titles carefully
  7. use meta tags, but only if they’re accurate
  8. use robot meta tags, and robots.txt
  9. use structural markup, especially headings
  10. give anchors sensible text (“click here” does not qualify as sensible)
  11. use link titles and alt text
  12. give files and folders meaningful names
  13. provide default pages in directories so people can hack your URLs
  14. forge meaningful (human) links with other sites, and make technical links accordingly
  15. encourage inward links to your site
    • make urls readable and linkable to
    • don’t break links (at least give redirects)
  16. don’t use javascript for links/popup windows that you want to be indexed
  17. avoid links embedded in flash movies
  18. never use frames
  19. never use cookies to power navigation
  20. give example searches or browse trees to open databases to search engines
  21. maximise the content richness of pages
  22. avoid leaf node pages (always create links back to the rest of the site)
  23. limit the use of PDFs
  24. take common typos into account, or spelling variations (optimisation vs optimization is a good example)
  25. update the site regularly
  26. don’t use hidden text or comments to try and convey spam words
  27. don’t embed text in images
  28. avoid writing out text using javascript
  29. don’t use browser detection to alter content or restrict access
  30. provide meaningful error pages
  31. be realistic about what you can achieve optimsation-wise
  32. establish a traffic baseline
  33. use monitoring tools to track your progress

At some point I hope to provide evidence backing up each of these claims. In the meantime you’ll just have to trust me, but it won’t cost you anything.

New Year, New Blog

I got a bit bored of the limited functionality of my old blog at — that platform just wasn’t keeping pace with the state of the art in blogging software, which is a shame, as I specifically started using it because of the FOAF output — so have decided to switch to a WordPress install here at I’ve managed to import the 20 most recent posts from the old blog using the WordPress import tool for RSS (hooray for data in reusable formats) but not the earlier ones as there is no RSS feed for those (boo to that needless limitation). There’s still some cleanup to do on the imported posts, some general housekeeping tasks and a decent theme to choose, which will be a good displacement activity for me over the next few weeks. Any suggestions for essential WP plugins to install would be welcomed.

What's with the images in Cuil?

I’ve just been having a play with Cuil. In general I really like it, particularly the richer layout. What is very weird (aka rubbish) though is the algorithm they’re using to select images for display next to each result. A quick search for Talis shows some relatively sensible accommpanying images, although I’m not sure who the young guy with the beard is.

A bit of vanity searching though throws up all sorts of weirdness. This time who is the old dude with the beard known as 303 See Other? He looks kind of familiar, but there’s no way it’s me. And who’s the other young guy with the whispy chin hair, and why is he squatting on my publications page? I like the juxtaposition of Linked Data and the Killer App image, but why? There seem to be far too many false positives, so come on Cuil, up the confidence threshold slightly.

Old Web Site, New Location

After leaving it to languish for years, I’ve finally made some good use of, which is the new location for my Web site that previously lived at The content is pretty much the same, and badly needs an update, but this is the first stage of my migration from Web hosting at KMi. No plans to move this blog just yet, although without a few improvements from the my.opera team I might be tempted.

Continental In-Flight Entertainment Runs Linux

Watching any system spontaneously reboot is a slightly unnerving experience, especially when it’s on a Boeing 757 during take-off. This happened to me last night on a Continental flight back to the UK after Linked Data Planet, and luckily (at least as far as I know) it was only the in-flight entertainment system that restarted, and presumably at the hands of one of the cabin crew.

I’m guessing the cabin crew aren’t geeks, but I was mildly entertained to see that the system runs on Linux, with the penguin there for all to see. I couldn’t get any photos of the startup messages, as turning on my phone at that point seemed like a bad idea, for so many reasons, but watching the whole process was mildly more entertaining than a game of TuxRacer. The only disappointment came when I got back to Bristol airport and saw that the machine running the baggage carousel screens was behind with its Windows updates. Sigh.

Powerset: More Than Just a Pretty Face?

For this months Semantic Web Gang podcast we were joined by Barney Pell from Powerset, who recently launched a public beta of their long-awaited natural language query engine operating over Wikipedia data. Amid all the buzz, it was great to hear about Powerset straight from the horse’s mouth, and prompted me to spend some time exploring the system. This post is about what I found.

I took Charlie Chaplin as my starting point, wanting a topic that should have fairly broad coverage, and asked “who did Charlie Chaplin marry?”. Powerset returned the name “Mildred Harris” in the results, which seemed like a fairly reasonable response. I have no idea if it’s correct, but looking for the same information via DBpedia I found two answers: Mildred Harris and Paulette Goddard. Interesting that Powerset didn’t pick up both of those, or at least it didn’t show me those in the first set of results.

Interestingly the results page for this query shows “Factz” at the top that the Powerset algorithms have extracted from the Wikipedia articles, presented (broadly speaking) in the form of subject, predicate and object triples, e.g. “Chaplin married actress, Mildred Harris”, and showing the sentence context from which they were extracted. At a general level this reminds me of Vanessa‘s work on PowerAqua, which breaks queries down into “linguistic triples” and operates pretty impressively over existing RDF data sets. I can’t help feeling that Powerset’s triple extraction algorithms and the PowerAqua query engine could be an interesting combination.

Underneath the “Factz” at the top of the results page are a series of “Wikipedia Article” results, the first of which contains the sentence from which the “Chaplin married actress, Mildred Harris” information is extracted. The key parts of this sentence are also highlighted, enabling me to pick out the information that answered my question (in part at least).

By this point I was fairly taken with the interface, which is sweeter eye candy than either Wikipedia or DBpedia, but not necessarily faster than either, and may be guilty of presenting only half the picture. I’m also not yet convinced that if we took a large sample of natural language queries and compared the results returned, whether Powerset would significantly outperform the results provided by Google, who are consistently good at highlighting in their search results the passage of a document that is relevant to the query. Of course Google uses a much larger corpus than Powerset, but it’s interesting to note that the summary of the first result for the Charlie Chaplin query on Google reads “Charlie Chaplin was married four times and had 11 children between 1919 and 1962”.

To continue my exploration I tried another natural language query: “what is the population of brazil?”. This would seem like something of a no-brainer for a search engine with any semantic capabilities, and access to the rich knowledge bound up in Wikipedia. However, this time there were no headline “Factz” helping the answer to jump out at me. Instead there were Wikipedia Article results, the first of which was a node titled “Population of Brazil” that comes with an accompanying chart, but does not show the actual answer based on the latest available figures. Result number 4 (“Economy of Brazil”) does have as its result summary the text “In the space of fifty five years (1950 to 2005), the population of Brazil grew by 51 million to approximately 180 million inhabitants, an increase of over 2% per year”, but none of this is highlighted as the answer to my question.

Going back to the Charlie Chaplin example, I followed the associative links in my own mental Web and arrived at the entry for “Waibaidu Bridge“, an historic landmark on the Shanghai waterfront, located (when it’s not been taken away for repairs) just down the street from the Astor House Hotel, another Shanghai landmark where Chaplin apparently stayed on more than one occasion. Waibaidu Bridge has an entry on Wikipedia, and therefore also [[]an entry on DBpedia and in Powerset.

The Wikipedia entry itself is a really nice one; just enough historical background to be useful, a couple of bits of trivia (the bridge features briefly in the film Empire of the Sun), and a manually compiled list of places nearby. All of this is visible in Powerset, wrapped in their rather more 2008 interface. There are also a number of “Factz” extracted from the text of the Wikipedia article and presented in a box on the right. These are simply more of the subject, predicate, object triples mentioned previously, and sadly they add little value to the article. Here are some examples from the first section of the article:

* name bears name
* Waibaidu bore name
* citizens use ferries
* decade(1850) increases need

There are a couple that capture key elements from the article:

* Wales built bridge (note that this was a person named Wales, rather than the country Wales)
* Chinese paid toll (reflecting the history of the original Waibaidu bridges and the discriminatory tolls charged to Chinese people crossing them)

However these are mostly drowned out by the surrounding noise:

* ferry eases traffic
* Outer ferried cross
* powers restrict people

In the end it’s quicker just to read the article, as you’ll need to do so anyway to understand the “Factz” and check that they stand up. The “ferry eases traffic” “Fact” is actually incorrect, as the sentence from which this is extracted reads “In 1856, a British businessman named Wales built a first, wooden bridge at the location of the outermost ferry crossing to ease traffic between the British Settlement to the south, and the American Settlement to the north of Suzhou River.”, which has quite the opposite implication.

All this aside, one glaring ommission from Powerset struck me when looking at this page, and it was this that really made me wonder whether Powerset is anything more than just a pretty face. Some thoughtful geodata geek has made the effort to record the geo coordinates of Waibaidu bridge in the Wikipedia entry; 31°14’43″N, 121°29’7.98″E apparently. Now Wikipedia doesn’t seem to do anything in particular with this data; the list of places nearby is manually compiled. I’ll forgive them this, as I imagine they have their hands full with keeping the whole operation running. Perhaps if I donated some money they would consider doing this by default for all entries with geo-coordinates.

However, what isn’t so easily forgiven is Powerset ignoring this information completely, not even bothering to start the page with a small Google map next to the nice old photo showing the view to the Bund, let alone thinking to use a service like Geonames to compute from the Web of Data a list of nearby places. (For the record, DBpedia doesn’t do this itself, but by making the effort to link items across the Wikipedia and Geonames data sets it does the majority of the hard work already). For an application that gets so closely associated with the Semantic Web effort (whether Powerset desire this or not) I find this ommission quite sad. It’s such a no brainer, and beautifully demonstrates the kind of thing that will separate Semantic Web applications, from just more closed world systems that happens to do something smart. I put this question about use of external data sets to Barney in the Gang podcast, but, either due to the intensity of the medium or bad communication on my part due to my cold-addled brain, the true meaning of my question was lost.

The question of when Powerset will open up its technology to other text sources, and even the Web at large, always comes up. For me this is a less interesting question than the one about when/if the company will make use of existing structured data sets in their user-facing tools. I hope that with time, and perhaps less pressure now that a product is out the door, Powerset will implement the kind of features I talk about above, as the starting point for becoming a true Semantic Web application. Until then however, the current product will be, for me, really just Wikipedia hiding behind a pretty face.

Garlik Launches FOAF Services

The FOAF space got a whole lot more interesting yesterday, when Garlik released two FOAF services under their QDOS umbrella. The first is effectively a viewer on FOAF data crawled from across the Web. Have a look at the data about Danny Ayers to see it in action.

As far as I can tell, having looked very briefly yesterday (the QDOS site is down at the moment for maintenance), the second service will use the collected social network data to enable services such as blog comment whitelisting based on connections in the graph, presumably in the manner used by the DIG blog at MIT.

For some reason I find this service much more exciting than the Google Social Graph API. Perhaps it’s because the first incarnation of the SG API obviously didn’t get FOAF, and claimed that Mischa Tuffield was trying to steal my identity (presumably because he had a fragment of RDF about me in his FOAF file). The SG API does seem to have improved (I can’t replicate the original bug), and is useful for finding who still links to my FOAF in its old location, but I’m still more drawn to what Garlik have to offer. Perhaps it’s because I trust them to do it right,; whilst there are currently errors in the output about me (John Domingue and I are apparently the same person), I know exactly where the error comes from, and it’s human. Perhaps it’s because the Social Graph API feels polluted by XFN.

As yet there don’t seem to be any actual APIs or machine-friendly services offered by Garlik over this FOAF data, and with the site being down I can’t hunt around for these. Requesting different content types from the site doesn’t have any effect either, but knowing the people behind Garlik there’ll be some interesting stuff on its way. Full SPARQL over FOAF data would be nice 😉 Either way, this could well be the trigger for large-scale updating of people’s FOAF files, something which is long overdue, my own included.

Update: The QDOS site came back up shortly after I posted this. The second service is broadly as described above. It’s called the “Social Verification” service. Tom Ilube from Garlik talks about this in some more detail in Issue 2 of Nodalities Magazine.