« Barney Pell podcast with Dan Farber about the Singularity and AI | Main | Management changes at Powerset »
September 12, 2007
Tim Converse on Proximity is a Hack
Powerset's Tim Converse wrote a great article entitled: Proximity is a Hack.
In the article, Tim says that the two biggest improvements in web search were the use of links (including anchor text) and term proximity. The article explores the benefits of term proximity and argues that works to the extent that it approximates linguistic relationships in the text.
He concludes that natural language processing of the documents should have the ability to more accurately capture linguistic relationships even if the query itself is in keywordese (as opposed to a natural language query with internal linguistic structure).
To recap: proximity is both a wonderfully powerful relevance feature, and a total hack. It helps enormously, but it’s not what you really want, it’s just sorta somewhat correlated with what you really want. What you need for what you really want is the underlying structure of all that web content: the real syntactic structure of the sentences, how the sentences connect to each other, how the facts relate, and (maybe) how the discourse flows and the topics connect. We’ve squeezed all the juice we can out of webpages considered as word-vectors; now it’s time to parse this stuff and get at the real structure.Can that be done? A couple of years ago I would have said no, but I hadn’t seen the PARC natural language technology then, and didn’t know that an effort this concerted and well-funded was on the way. Now, do I think that Powerset will do it? I still don’t know, frankly - there’s so much more to do to make it real and debugged and scaled the way it needs to be. But it’s clear to me that the next big thing in web search is either this or something a whole lot like this, and I think we have the best shot of anyone. And that’s why I’m at Powerset.
The article is definitely good reading for people interested in search and the potential benefits of NLP.
Posted by barney at September 12, 2007 9:03 PM
This entry was posted in the following categories: Human Language Technology , Powerset , Search
Trackback Pings
TrackBack URL for this entry:
http://www.barneypell.com/blog/mt-tb.cgi/97
Comments
Post a comment
Thanks for signing in, . Now you can comment. (sign out)
(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)