My Photo

Find Me On...

  • Twitter
  • FriendFeed
  • Last.fm
  • del.icio.us
  • Content Matters
  • Disqus
  • LinkedIn
  • Facebook

Email me

Blogged Badge

  • Content Matters at Blogged

Content Matters Community

Content Links

  • AlacraBlog
  • AlacraWiki
  • billtrippe.com
  • ContentBiz
  • E-Media Tidbits
  • eContent Magazine
  • InfoCommerce
  • ONLINE Insider
  • PaidContent.org
  • Research Recap
  • Seth Godin Blog
  • Shore Communications
  • That We Know
  • The Content Wrangler
  • Web Ink Now
Blog powered by TypePad

« MarketingSherpa Names Paul Allen Online Subscriptions Entrepreneur of the Year | Main | Jerry, Meet Carl »

May 13, 2008

Powerset Launches; Most Ambitious Semantic Search to Date

The long-hyped awaited semantic search engine Powerset has finally launched. To start, Powerset is running its search engine against Wikipedia and including reference data from Freebase.

Powerset uses semantic analysis in a few ways:
There is a natural language query box, so it can interpret questions like "Which companies has Microsoft acquired?", matching entities and facts from your search request to those in the Wikipedia data.

In displaying results, it first attempts to disambiguate multiple references to your search and arranges the content accordingly. Next, it applies fact extraction, identifying relationships between entities and displaying what it calls Factz.


In this example, we see how Powerset handles the "disambiguation 101" example of the word Java, starting with the island, the programming language, a band, and more. Next, it shows Factz about Java - relationships for programmed, used and supported.

The Factz seem a bit simple - it appears they're simply extracting noun-verb pairs. So, while the initial factz for Java make sense (such as programmed-language), as you dig deeper you find examples like "jarred confdesigner", whatever that may mean. The fact was apparently extracted from the sentence: ConfDesigner can be started directly via "java -jar confdesigner.jar" (because of added jar-Manifest).

I show this example not to disparage Powerset, but rather to point out how difficult it is to do semantic analysis on a massive corpus of text like Wikipedia. With a homogenous data set, for example, business news or bioinformatics data, you can tune a semantic engine for maximum precision and recall. With such a general data set, it's incredibly hard to consistently generate strong results.

The Powerset user interface is very clean. For any Wikipedia page, they add an "Article Outline" floating toolbar. This Ajax-based toolbar can either display the standard outline of the page (simply parsing the wikipedia tags around heading sections) or you can click on "Show Factz" and it will display the facts that it uncovers within each section as shown here. I think this adds a lot of value to Wikipedia.

Overall, I'd expect that I will use Powerset to search Wikipedia going forward, rather than going to the underlying site. While other power users may do so, the bulk of the wikipedia traffic comes from Google and I expect that to continue in the future.

I don't think the team at Powerset expects to make their money as a better search tool for Wikipedia in the long run. Rather, it's a proof-of-concept to demonstrate the capabilities of semantic search. Powerset has delivered a very compelling site search engine. While others may try to compare Powerset to Google, that's not a realistic comparison. Barney Pell and company have set their ambitions on replacing Google, but in the long run, I think they'll find their niche will be in semantic search for the enterprise, a web site or a specific domain.

I've been pretty close to the semantic search and text mining space since my early days at ClearForest in 2000, and while the promise of semantic analysis has always been great, the actual deliverables have consistently come up short.

Powerset seems to be making a credible claim to be the first legitimate semantic search engine of any scale.

Powerset has been 2+ years in the making, which seems a lifetime in the persistent beta world of Web 2.0, but they've built on 15+ years of computational linguistics and seem to have a viable offering. It will be interesting to see where they take it next.

For more on Powerset, read John Blossom, TechCrunch and ReadWriteWeb.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/480703/29063586

Listed below are links to weblogs that reference Powerset Launches; Most Ambitious Semantic Search to Date:

Comments

blog comments powered by Disqus

RSS Feed

  • Subscribe in NewsGator Online
  • Subscribe in Bloglines

  • Add to netvibes

  • Add to Google

Subscribe to Content Matters via email


  • Enter your Email


    Powered by FeedBlitz

Premium Content from the Alacra Store

  • Related Research from Alacrastore.com

Categories

Blogged

  • Blog Directory - Blogged