Archive Page 3
September 3rd, 2008 by Tom Heath
I’m here in Graz for Triple-I. All credit to Klaus Tochtermann, Hermann Mauerer and many others for putting together what is shaping up to be a great event. Things got off to a good start with the first keynote, from Henry Liebermann, who talked about how we manage and utilise common sense knowledge. The talk got me thinking on many levels, such as how we build task- or goal-oriented interfaces, and whether we might be able to get the 21 “most commonly used relationships in common sense statements” into an appropriate form for use on the Semantic Web. With the bar already set very high, I’d best go and work some more on the slides for my own keynote on Friday, titled “Humans and the Web of Data“.
September 1st, 2008 by Tom Heath
ISWC2008 isn’t upon us just yet, but already the preparations for ISWC2009 are under way. I’m pleased to say that I’ll be serving as co-chair of the Semantic Web In Use track alongside Lee Feigenbaum.
In my experience the ISWC series has been growing steadily in strength year on year and, while I’m inevitably biased, last year’s conference did seem a watershed moment for the series and the Semantic Web as a whole. There was a tangible energy in the air that suggested the Semantic Web was no longer just a vision, but both real and inevitable. It will be interesting to see in Karlsruhe how things are shaping up one year on. I can only speculate about where we’ll be at by autumn 2009, but I’m very much looking forward to finding out.
August 21st, 2008 by Tom Heath
Leigh Dodds has just blogged publicly about his forthcoming move to Talis. From 1st September he’ll be joining us as Programme Manager for the Talis Platform. I’m personally really excited about having Leigh on board – he’s been an impressive figure on the Semantic Web scene for quite some time; IIRC I even used his FOAF-a-matic tool to create my first FOAF file back in the day. Not only will he bring some impressive skills to the company, but his move here further demonstrates that we can attract top-class Semantic Web talent. Leigh, welcome on board
August 4th, 2008 by Tom Heath
I’ve just been having a play with Cuil. In general I really like it, particularly the richer layout. What is very weird (aka rubbish) though is the algorithm they’re using to select images for display next to each result. A quick search for Talis shows some relatively sensible accommpanying images, although I’m not sure who the young guy with the beard is.
A bit of vanity searching though throws up all sorts of weirdness. This time who is the old dude with the beard known as 303 See Other? He looks kind of familiar, but there’s no way it’s me. And who’s the other young guy with the whispy chin hair, and why is he squatting on my publications page? I like the juxtaposition of Linked Data and the Killer App image, but why? There seem to be far too many false positives, so come on Cuil, up the confidence threshold slightly.
July 29th, 2008 by Tom Heath
The last 18 months have seen amazing progress in the world of Linked Data, but we now face a new challenge: availability of vocabularies to describe this data. OK, so it’s not really a new challenge at all, but this time it’s real, and urgent. Anyone stumbling across a tasty open data set on the Web is generally faced with the decision of whether to create the necessary vocabulary with which to describe the data, or walk away and find something to do that is more immediately gratifying. There just isn’t a critical mass of existing vocabularies with which to describe the data that is already out there on the Web.
Out of the desire to do something about this issue, and spurred on by discussions with a number of people in the community, Richard Cyganiak and I have set a ball rolling called VoCamp – lightweight, informal hackfests, where motivated people can get together and spend some dedicated time creating vocabularies/ontologies in any area that interests them. Thanks to the generous efforts of David Shotton and Jun Zhao, the first VoCamp will take place in Oxford in late September.
We hope that this is a ball that will roll beyond that one event, and are already talking to others who have expressed an interest in hosting a VoCamp where they are based. If you want to see the Web of Data realised, and share our view that the vocabulary bottleneck is just a little bit restrictive, perhaps you’d like to run a VoCamp where you live/work (or anywhere else you like). It’s very easy, just drop me (firstname.surname at talis.com) and Richard (firstname.surname at deri.org) an email and we’ll point you in the right direction.
July 8th, 2008 by Tom Heath
After leaving it to languish for years, I’ve finally made some good use of tomheath.com, which is the new location for my Web site that previously lived at http://kmi.open.ac.uk/people/tom/html. The content is pretty much the same, and badly needs an update, but this is the first stage of my migration from Web hosting at KMi. No plans to move this blog just yet, although without a few improvements from the my.opera team I might be tempted.
June 23rd, 2008 by Tom Heath
Watching any system spontaneously reboot is a slightly unnerving experience, especially when it’s on a Boeing 757 during take-off. This happened to me last night on a Continental flight back to the UK after Linked Data Planet, and luckily (at least as far as I know) it was only the in-flight entertainment system that restarted, and presumably at the hands of one of the cabin crew.
I’m guessing the cabin crew aren’t geeks, but I was mildly entertained to see that the system runs on Linux, with the penguin there for all to see. I couldn’t get any photos of the startup messages, as turning on my phone at that point seemed like a bad idea, for so many reasons, but watching the whole process was mildly more entertaining than a game of TuxRacer. The only disappointment came when I got back to Bristol airport and saw that the machine running the baggage carousel screens was behind with its Windows updates. Sigh.
May 22nd, 2008 by Tom Heath
For this months Semantic Web Gang podcast we were joined by Barney Pell from Powerset, who recently launched a public beta of their long-awaited natural language query engine operating over Wikipedia data. Amid all the buzz, it was great to hear about Powerset straight from the horse’s mouth, and prompted me to spend some time exploring the system. This post is about what I found.
I took Charlie Chaplin as my starting point, wanting a topic that should have fairly broad coverage, and asked “who did Charlie Chaplin marry?”. Powerset returned the name “Mildred Harris” in the results, which seemed like a fairly reasonable response. I have no idea if it’s correct, but looking for the same information via DBpedia I found two answers: Mildred Harris and Paulette Goddard. Interesting that Powerset didn’t pick up both of those, or at least it didn’t show me those in the first set of results.
Interestingly the results page for this query shows “Factz” at the top that the Powerset algorithms have extracted from the Wikipedia articles, presented (broadly speaking) in the form of subject, predicate and object triples, e.g. “Chaplin married actress, Mildred Harris”, and showing the sentence context from which they were extracted. At a general level this reminds me of Vanessa‘s work on PowerAqua, which breaks queries down into “linguistic triples” and operates pretty impressively over existing RDF data sets. I can’t help feeling that Powerset’s triple extraction algorithms and the PowerAqua query engine could be an interesting combination.
Underneath the “Factz” at the top of the results page are a series of “Wikipedia Article” results, the first of which contains the sentence from which the “Chaplin married actress, Mildred Harris” information is extracted. The key parts of this sentence are also highlighted, enabling me to pick out the information that answered my question (in part at least).
By this point I was fairly taken with the interface, which is sweeter eye candy than either Wikipedia or DBpedia, but not necessarily faster than either, and may be guilty of presenting only half the picture. I’m also not yet convinced that if we took a large sample of natural language queries and compared the results returned, whether Powerset would significantly outperform the results provided by Google, who are consistently good at highlighting in their search results the passage of a document that is relevant to the query. Of course Google uses a much larger corpus than Powerset, but it’s interesting to note that the summary of the first result for the Charlie Chaplin query on Google reads “Charlie Chaplin was married four times and had 11 children between 1919 and 1962″.
To continue my exploration I tried another natural language query: “what is the population of brazil?”. This would seem like something of a no-brainer for a search engine with any semantic capabilities, and access to the rich knowledge bound up in Wikipedia. However, this time there were no headline “Factz” helping the answer to jump out at me. Instead there were Wikipedia Article results, the first of which was a node titled “Population of Brazil” that comes with an accompanying chart, but does not show the actual answer based on the latest available figures. Result number 4 (“Economy of Brazil”) does have as its result summary the text “In the space of fifty five years (1950 to 2005), the population of Brazil grew by 51 million to approximately 180 million inhabitants, an increase of over 2% per year”, but none of this is highlighted as the answer to my question.
Going back to the Charlie Chaplin example, I followed the associative links in my own mental Web and arrived at the entry for “Waibaidu Bridge“, an historic landmark on the Shanghai waterfront, located (when it’s not been taken away for repairs) just down the street from the Astor House Hotel, another Shanghai landmark where Chaplin apparently stayed on more than one occasion. Waibaidu Bridge has an entry on Wikipedia, and therefore also [[http://dbpedia.org/resource/Waibaidu_Bridge]an entry on DBpedia and in Powerset.
The Wikipedia entry itself is a really nice one; just enough historical background to be useful, a couple of bits of trivia (the bridge features briefly in the film Empire of the Sun), and a manually compiled list of places nearby. All of this is visible in Powerset, wrapped in their rather more 2008 interface. There are also a number of “Factz” extracted from the text of the Wikipedia article and presented in a box on the right. These are simply more of the subject, predicate, object triples mentioned previously, and sadly they add little value to the article. Here are some examples from the first section of the article:
* name bears name
* Waibaidu bore name
* citizens use ferries
* decade(1850) increases need
There are a couple that capture key elements from the article:
* Wales built bridge (note that this was a person named Wales, rather than the country Wales)
* Chinese paid toll (reflecting the history of the original Waibaidu bridges and the discriminatory tolls charged to Chinese people crossing them)
However these are mostly drowned out by the surrounding noise:
* ferry eases traffic
* Outer ferried cross
* powers restrict people
In the end it’s quicker just to read the article, as you’ll need to do so anyway to understand the “Factz” and check that they stand up. The “ferry eases traffic” “Fact” is actually incorrect, as the sentence from which this is extracted reads “In 1856, a British businessman named Wales built a first, wooden bridge at the location of the outermost ferry crossing to ease traffic between the British Settlement to the south, and the American Settlement to the north of Suzhou River.”, which has quite the opposite implication.
All this aside, one glaring ommission from Powerset struck me when looking at this page, and it was this that really made me wonder whether Powerset is anything more than just a pretty face. Some thoughtful geodata geek has made the effort to record the geo coordinates of Waibaidu bridge in the Wikipedia entry; 31°14’43″N, 121°29’7.98″E apparently. Now Wikipedia doesn’t seem to do anything in particular with this data; the list of places nearby is manually compiled. I’ll forgive them this, as I imagine they have their hands full with keeping the whole operation running. Perhaps if I donated some money they would consider doing this by default for all entries with geo-coordinates.
However, what isn’t so easily forgiven is Powerset ignoring this information completely, not even bothering to start the page with a small Google map next to the nice old photo showing the view to the Bund, let alone thinking to use a service like Geonames to compute from the Web of Data a list of nearby places. (For the record, DBpedia doesn’t do this itself, but by making the effort to link items across the Wikipedia and Geonames data sets it does the majority of the hard work already). For an application that gets so closely associated with the Semantic Web effort (whether Powerset desire this or not) I find this ommission quite sad. It’s such a no brainer, and beautifully demonstrates the kind of thing that will separate Semantic Web applications, from just more closed world systems that happens to do something smart. I put this question about use of external data sets to Barney in the Gang podcast, but, either due to the intensity of the medium or bad communication on my part due to my cold-addled brain, the true meaning of my question was lost.
The question of when Powerset will open up its technology to other text sources, and even the Web at large, always comes up. For me this is a less interesting question than the one about when/if the company will make use of existing structured data sets in their user-facing tools. I hope that with time, and perhaps less pressure now that a product is out the door, Powerset will implement the kind of features I talk about above, as the starting point for becoming a true Semantic Web application. Until then however, the current product will be, for me, really just Wikipedia hiding behind a pretty face.
May 20th, 2008 by Tom Heath
The FOAF space got a whole lot more interesting yesterday, when Garlik released two FOAF services under their QDOS umbrella. The first is effectively a viewer on FOAF data crawled from across the Web. Have a look at the data about Danny Ayers to see it in action.
As far as I can tell, having looked very briefly yesterday (the QDOS site is down at the moment for maintenance), the second service will use the collected social network data to enable services such as blog comment whitelisting based on connections in the graph, presumably in the manner used by the DIG blog at MIT.
For some reason I find this service much more exciting than the Google Social Graph API. Perhaps it’s because the first incarnation of the SG API obviously didn’t get FOAF, and claimed that Mischa Tuffield was trying to steal my identity (presumably because he had a fragment of RDF about me in his FOAF file). The SG API does seem to have improved (I can’t replicate the original bug), and is useful for finding who still links to my FOAF in its old location, but I’m still more drawn to what Garlik have to offer. Perhaps it’s because I trust them to do it right,; whilst there are currently errors in the output about me (John Domingue and I are apparently the same person), I know exactly where the error comes from, and it’s human. Perhaps it’s because the Social Graph API feels polluted by XFN.
As yet there don’t seem to be any actual APIs or machine-friendly services offered by Garlik over this FOAF data, and with the site being down I can’t hunt around for these. Requesting different content types from the site doesn’t have any effect either, but knowing the people behind Garlik there’ll be some interesting stuff on its way. Full SPARQL over FOAF data would be nice
Either way, this could well be the trigger for large-scale updating of people’s FOAF files, something which is long overdue, my own included.
Update: The QDOS site came back up shortly after I posted this. The second service is broadly as described above. It’s called the “Social Verification” service. Tom Ilube from Garlik talks about this in some more detail in Issue 2 of Nodalities Magazine.
May 16th, 2008 by Tom Heath
It’s been a quiet month, blog-wise, mainly due to MyOpera being inaccessible from behind the “Great Firewall of China“. Not sure what content on here is worth screening out (except perhaps on quality grounds
, but anyway…
I was in China for WWW2008 initially, and then two weeks of holiday, giving me a chance to see some of this vast country, meet some great people who went out of their way to help us, have varied success at avoiding being ripped off (an occupational hazard for travellers in China it seems), and catch a glimpse of somewhere undergoing huge changes.
On the subject of the Great Firewall, I’ll admit to being a bit disappointed that TimBL’s Keynote at WWW2008 didn’t address this issue more explicitly. On the other hand, he gave a great plug in his speech for the Linked Data on the Web workshop we co-chaired with Chris Bizer and Kingsley Idehen earlier in the week (summarised nicely at ZDnet by Paul Miller), which really made my day.
To be fair to Tim though, I wouldn’t have wanted to be in his shoes, which were undoubtedly treading a very fine line. Before the conference I was see-sawing between thinking “he’s got to address this issue head-on”, and thinking “no way, it would just be too confrontational to raise it in this venue“.
Yesterday I came across this blog post on the subject from the New Scientist. Whilst I wholly sympathise with the strength of feeling, I think the post itself is misplaced, or at least misdirected. The IW3C2/W3C/Web community at large has two choices: engage, and hope to bring about change through dialogue and stronger relationships, or keep China at arms length and stand no chance of influencing policy on censorship.
In the end I think the correct decision was made, just as siting the 2008 Olympics in Beijing was probably the right decision. Let’s just hope that the human rights situation improves as much as was promised (and as fast as the public toilet situation seems to have done in Beijing ahead of the Olympics – excellent, in case you were wondering).
If the New Scientist writer is really going to take issue with the WWW2008 slogan, I think an equally valid target should be the “One World” aspect. OK, so geographically it’s true, but on all other counts I’m not convinced.