This article talks about the subject of semantics and the semantic web, what that means and why exploiting technology in this field could provide a key competitive advantage for companies that are savvy enough to spot the trends early enough.
Consider the following two questions,
1. Find Indian restaurants in Blackpool
2. What are my prospects are doing about renewable energy in the North West
What is the difference between these two questions?
Both are specific, both are answerable, both answers are valuable albeit in different ways however any well informed information consumer could reasonably expect any internet search engine (Yahoo, Google, Microsoft etc.) to answer one of them quickly and easily, however the other would be difficult to the point of infeasible, why?
If you analyse the assumptions underpinning these two questions you begin to understand why semantics are important in the field of search and information gathering.
In the first example there are two important terms, these are “Indian restaurant” and “Blackpool” the important thing about them is that they are relatively unambiguous and their meaning would be fairly universal among any given set of people. However this doesn’t mean that these terms are completely unambiguous, for example there may well be several places around the world called “Blackpool” and the term “Indian” as it relates to food represents a very diverse set of possibilities, for example curry, tandoori, kebabs, Bengali, Bangladeshi and Pakistani etc. The reason why this first question is relatively easy to answer has more to do with the fact that there is better consensus about what the question means among a large number of humans, than the physical mechanism that a computer would use to find and group content on the internet in order to display it when the question is asked.
In the second example we run into difficulty almost immediately, i.e. what does the term “prospects” mean? Clearly, given any particular company or sales & marketing team this term is perfectly unambiguous, however to anyone outside of that circle of understanding (i.e. the internet search engines) it is meaningless. Next we have the term “renewable energy”; this is certainly more specific and more broadly identifiable than “prospects” but never the less it would have a wide meaning and the list of things it could successfully represent would be different depending on who is asked. This article talks about the subject of semantics and the semantic web, what that means and why exploiting technology in this field could provide a key competitive advantage for companies that are savvy enough to spot the trends early enough.
Lastly we have a classically ambiguous term in “North West”, utterly meaningless unless you also know which country, counties, towns and postcodes or perhaps it is simply a manufactured entity to suit a particular business or company, their market and how they sell to that market. From this simple example we can see that semantics are important, and that lack of meaning in content is one of the primary barriers to getting machines to successfully find things for us.
One of the proposed approaches to solving this problem of “meaning” is the so called semantic WEB or WEB 3.0 as it is sometimes referred to, this is a set of ideas and projects which aim to extend the World Wide Web to include some notion of semantics in information and services, essentially “describing” content with a uniform set of definitions at source and therefore disambiguating it. WEB 3.0 is not a new concept, it has been around since an original 2001 article in Scientific American by WEB pioneer Tim Berners-Lee, it is fair to say that this a much debated topic in the computer industry, most often people discuss whether the concepts are,
- Feasible at all with the limitations of current WEB technology (like HTML)
- Will ever see the light of day outside of small well defined research communities
There are some key barriers to fulfilling the goals of the semantic WEB, not just technology but human, there is an underlying assumption that everyone will agree on how to “file” or structure information, even within relatively narrow domains of interest this seems to be a problem. A good way of visualising these structures is to use a diagram like a tree starting at a very high level, for example a company, then branching out and splitting up into ever more detail. Often people use the term “taxonomy” to label such trees, this is a term stolen from Biology where similar structures are used to describe how species are related and grouped.
In the world of corporate business intelligence this problem is equally challenging, imagine you are a sales person trying to answer the second question in our example by using one of the standard internet search engines. Let’s say there are 500 towns in your company’s definition of the “North West region” and that your prospect list contains 500 companies then add to that the 10 different terms that could mean renewable energy in theory you would have 500 X 500 X 10 = 2,500,000 separate searches to do in order to attempt to cover all of the combinations that would be of interest. In addition to doing millions of individual searches you would also want to be able to filter the results by what you saw the last time you did the search, obviously you only want to see new stuff!
An example of a typical taxonomy is shown below:
Even using this trivial example it is clear to see why this is a difficult challenge for most organisations. At Artesian we recognise this challenge, our approach is solving it is to provide two key weapons that corporate consumers can use to provide better surveillance of relevant internet content.
These two tools are,
1. A mechanism to define your own taxonomy.
2. A framework that does the “brute-force” indexing and searching so that you don’t have to.
Designing and building such software enables a number of new capabilities that were previously impractical and beyond reach for normal corporate information consumers; firstly it allows them to define their own taxonomy, i.e. geographies, customers, prospects, products etc. Secondly it allows them to index content from the internet using this taxonomy in order to build indexes which can be queried just like you can with current public search engines, except, critically that these indexes already understand the semantics of that particular business making answering question two in our example trivial.