29 July 2004

<noun> <verb> <adj>

Ftrain (Paul Ford) has written a fictional article on Google. It explains, in 2009, how Google became King of the World by promoting and harnessing the semantic Web--a universal medium for information exchange by giving meaning, in a manner understandable by machines, to the content of documents on the Web. A concept near and dear to any linguist's or amateur linguist's heart.

Paul Ford designed Harper's site and is one of those Web design gurus I go on about. Actually, he's a guru's guru. Harper's site is a work of genius and, although elegant, equally difficult to understand fully. The explanation of formatting and logic (for the perplexed or curious) goes into detail about why some links appear different than others, how some pages are aggregates of pieces of other pages, how the hierarchy is created, etc.

Look at the Connections page from the site. Groups of high level concepts are presented, and as you drill down you eventually get to a timeline of events. Connections -> Supernatural Beings -> Gods: 2004, Jan 2, God told Pat Robertson that George W. Bush would be reelected. With all items linked and referenced to their source material. It's what hypertext should be, and it looks very similar in spirit to WordNet. WordNet can be thought of as a semantic dictionary. Entries point "sideways" to their synonyms, up to their hypernyms ("X is a kind of"), down to their hyponyms ("are kinds of X"), internally to their meronyms ("X is made up of"), and externally to their holonyms ("are made of X"). And that's just for nouns. See, for instance, the entry on store, and look at its hierarchy of hypernyms:

store
=> mercantile establishment
=> place of business
=> establishment
=> structure
=> artifact
=> object
=> entity

The point of Ford's article is that putting the expansive indexing of Google together with semantic markup would produce intelligent searching. For anyone who's ever played with Prolog and expert systems, you realize the immense potential of having those systems dynamically generated from terabytes of content. Expert systems are great but tedious to program (and therefore only dubiously labeled AI). Google plus semantic markup would do it automatically:

So where did Google Marketplace Search get its information? The same way Google got all of its information - by crawling through the entire web and indexing what it found. Except now it was looking for RDDL files, which pointed to RDF files, which contained logical statements...
(RDDL files are directories pointing to RDF files which contain semantic information in XML. They contains simple statements about objects (e.g. "Scott is selling a guitar") that can be combined with other statments ("a guitar is a musical instrument") using predicate calculus--the sorcerer's appretice of natural language processing. Combining those statements with logic rules could yield a semantically rich Web.)

Imagine the WordNet hierarchy codified into every relationship on the Internet. If you're selling a guitar, a search index would understand "sell" and understand every synonym for "guitar" along with every meronym (do you need strings for that?). The seller would be linked up a chain of trustworthiness to banks in the same way that certificates are validated up a chain to Verisign. All of these semantic relationships are embedded automatically when the data is created.

There are many doubts about the semantic Web (could it really provide anything further than what's already being provided?), but the elegance of the concept makes it very compelling.

[ posted by sstrader on 29 July 2004 at 8:54:03 PM in Programming ]