Friday 4 December 2009

The semantic web: the internet as a global database

More and more data on the internet is being published in reusable and remotely queryable formats. Some of us may be familiar with XML which is way of structuring data so it can be interpreted by a variety of different applications and devices – for example RSS feeds are specified using XML

To make this really clear – here is an example:

We all have a common understanding of the concept of “a book”. We understand that are several elements to a book: it has an author, a publisher, a title and so on. So we have a common shared frame of reference for how we define a book.

However to a computer the concept of ‘a book’ is meaningless. What structured data formats such as XML do is to allow for the creation of common definitions so that, say, within the publishing industry information can be specified in the same way and shared easily between companies. So in the XML world a book can actually be defined as having a title, author, ISBN number, publisher etc. so that different computer systems and applications all ‘understand’ the definition of a book – and can then manipulate that data with a common frame of reference.

What this then means is that intelligent agents or spiders (essentially programs that crawl the web) can scan different websites, gather data and make valid comparisons. This is how price comparison sites such as Kelkoo and Confused.com work.

The next evolution of this is called the Semantic Web. “Semantics” is about the meanings of things and the Semantic web is described as a state of the internet where computers can not only recognise and compare structured data – but be able to actually understand how different pieces of information relate to each other.

“The Semantic Web describes the relationships between things (like A is a part of B and Y is a member of Z)…and the properties of things (like size, weight, age and price)” (source w3schools.com)

This moves us towards the vision of the web originally envisaged by its creator Tim Berners-Lee as a universal medium for data, information and knowledge exchange.

Back in 1999 he said:

“I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize”

And even though a statement made a decade ago appears prehistoric by today’s fast moving standards, elements of this vision are taking shape.

The World Wide Web Consortium (W3C) is the organisation responsible for setting technical standards on the web. They are examining a series of standards designed to make data as openly accessible and linkable as possible and in which automated software can store, exchange, and use machine-readable information distributed throughout the Web. As a result, this will enable users to deal with the information with greater efficiency and certainty.

One element of these standards is called RDF (Resource Description Framework) and putting information into RDF files makes it possible for intelligent agents and spiders to search, discover, pick-up, collect, analyse and process information from all over the internet. In essence the Semantic Web uses RDF to describe the content and resources on the internet.

And as all data on the internet becomes part of this standard format, it transforms the web from a random collection of pages into one huge database with each piece of data connected to each other in a way that computers can understand.

So what does that mean for marketers?

Put simply, it means that the internet is going to get organised.

As search engines start to recognise semantically tagged data in their pages that they crawl, structured data as per the RDF formats will present far more compelling summaries of those pages in their search results.

It’s basically like current search engine marketing – but on steroids. In fact this aspect is often referred to as Search Engine Optimisation Plus (SEO+). Search is seen as the killer application for the Semantic Web that will finally drive its growth.

So at the most fundamental it means that marketers will need to start managing brands in the Semantic layer that the consumer cannot see as actively as they manage their brands in the layers consumers can see – that’s websites, online advertising etc.

Scott Brinker on the Chief Marketing Technologist Blog suggests that:

“Marketing becomes champion of the underlying data – good, accurate, detailed content and the processes by which to keep it up to date. This isn’t just old school “marketing data” ie the stuff of brochures and the visual corporate website, but rich, detailed information that’s historically been trapped much deeper in the organisation – information that can create value for the firm by its wide disseminations….this constitutes a new kind of market positioning and placement…semantic branding if you will”


And having data in a more accessible format will mean that organisations will look to build rich data applications over the top of this data.

One example given by Tim Berners-Lee is of being able to combine your calendar and bank statements. If both of these talked the same language then a user would be able to drag their digital bank statements onto their calendar and a series of dots would appear showing the user when they spent their money. Now imagine that you still can’t remember where a particular transaction happened – then you could drag your photo album on top of your calendar and be reminded that you used your credit card at the same time you were taking pictures of your kids at a theme park.

And if we look at comparison or price aggregation sites, with the semantic web consumers will increasingly be able to make more accurate and reliable comparisons not just for more complex & configurable products such as cars, holidays etc, but also services such as builders, accountants and solicitors because the information on those products and services can be far richer and more structured.

Let’s take another example of a consumer searching for a ‘holiday’ in the future. The Semantic Web will allow people to use these ‘information agents’ and set them tasks such as “go and find me a holiday” that’s:
• In Greece
• By a beach
• But also has some historical interest
• That fits my calendar
• And fits within my budget

The intelligent agent is then left to instantly research the request, asking additional questions where necessary. An early example of this is www.tripit.com. With Tripit a user can simply forward their travel confirmation emails and the site and their Tripit “Itinerator” takes over and combines all the related travel bookings (from flights, car hire and hotels) into a single master itinerary. It then searches the Web to add related information such as daily weather, local maps, driving directions (for example to get from the airport to the hotel), city guides and so on. You can then access your itinerary from a mobile device or synchronise it with your PC calendar and share itineraries within group bookings to make sure there are no date issues or overlaps.

This will also be important to B2B marketers where our consumer habits of online research have been transported into the workplace and where businesses will need to create and manage their semantic data with increasing numbers of potential customers using search engines to research and shortlist suppliers.

Another example of the “Semantic Web” is something called ‘Friend of a Friend’ (FOAF). And this is interesting because of its impact on the future of social networking.

FOAF allows people to describe themselves using an RDF format. You can describe personal details, hobbies, relations to other people & things. But what’s fascinating about FOAF is that there is no one central database or repository of information. Your profile does not exist only in Facebook, or Bebo or Myspace. It’s a piece of data searchable by any computer. Computers may then use these FOAF profiles to find and relate people to one another.

It’s the 21st Century equivalent of '6-Degrees of Separation'.

2 comments:

sjbrinker said...

Hi, Jed -- great post! Thanks for the mention of my post on semantic marketing.

The "Internet as a Global Database" is an incredibly exciting concept, and we're getting closer to it becoming real. The implications this will have on marketing are immense, and it's going to be a wonderful adventure discovering those new opportunities.

Unknown said...

I think it is important to include the concept of an ontology, the map of the system, as RDFa tagging of webpages is a start but the real power of triple stores and the contextual driven data coming out of them will be the underlying ontology which is where the interesting stuff will happen with regard to the semantic web.