<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Powerset and Natural Language Search</title>
	<atom:link href="http://www.barneypell.com/2006/10/powerset-and-natural-language-search/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.barneypell.com/2006/10/powerset-and-natural-language-search/</link>
	<description></description>
	<lastBuildDate>Fri, 07 May 2010 13:32:55 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: stufflix</title>
		<link>http://www.barneypell.com/2006/10/powerset-and-natural-language-search/comment-page-1/#comment-44</link>
		<dc:creator>stufflix</dc:creator>
		<pubDate>Tue, 03 Apr 2007 14:18:54 +0000</pubDate>
		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=76#comment-44</guid>
		<description>I remember as a student at Cambridge Univ ORL the video retrieval project that was searching for passages in video content that provided specific visual content, the procurement of which was obtained by seaching audio sentences information mostly I think. I think for English the task would be far easier than for other inflexion based languages though so I am concerned that the expansion of project would be far more complex than the&quot; add another same type of google box approach to search country specific pages&quot;.
Well...I think a novel UI approach would also help greatly to assist in the construction of the sentences to be used as search artifacts and I think might be quite key to the project.
</description>
		<content:encoded><![CDATA[<p>I remember as a student at Cambridge Univ ORL the video retrieval project that was searching for passages in video content that provided specific visual content, the procurement of which was obtained by seaching audio sentences information mostly I think. I think for English the task would be far easier than for other inflexion based languages though so I am concerned that the expansion of project would be far more complex than the&#8221; add another same type of google box approach to search country specific pages&#8221;.<br />
Well&#8230;I think a novel UI approach would also help greatly to assist in the construction of the sentences to be used as search artifacts and I think might be quite key to the project.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: stufflix</title>
		<link>http://www.barneypell.com/2006/10/powerset-and-natural-language-search/comment-page-1/#comment-43</link>
		<dc:creator>stufflix</dc:creator>
		<pubDate>Tue, 03 Apr 2007 13:37:26 +0000</pubDate>
		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=76#comment-43</guid>
		<description>I remember as a student at Cambridge Univ ORL the video retrieval project that was searching for passages in video content that provided specific visual content, the procurement of which was obtained by seaching audio sentences information mostly I think. I think for English the task would be far easier than for other inflexion based languages though so I am concerned that the expansion of project would be far more complex than the&quot; add another same type of google box approach to search country specific pages&quot;.
Well...I think a novel UI approach would also help greatly to assist in the construction of the sentences to be used as search artifacts and I think might be quite key to the project.
</description>
		<content:encoded><![CDATA[<p>I remember as a student at Cambridge Univ ORL the video retrieval project that was searching for passages in video content that provided specific visual content, the procurement of which was obtained by seaching audio sentences information mostly I think. I think for English the task would be far easier than for other inflexion based languages though so I am concerned that the expansion of project would be far more complex than the&#8221; add another same type of google box approach to search country specific pages&#8221;.<br />
Well&#8230;I think a novel UI approach would also help greatly to assist in the construction of the sentences to be used as search artifacts and I think might be quite key to the project.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Barney Pell</title>
		<link>http://www.barneypell.com/2006/10/powerset-and-natural-language-search/comment-page-1/#comment-42</link>
		<dc:creator>Barney Pell</dc:creator>
		<pubDate>Fri, 10 Nov 2006 15:45:30 +0000</pubDate>
		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=76#comment-42</guid>
		<description>Patrick: You are right that Google is now using specific passage retrieval algorithms (ala Bill Woods) that take word proximity, variation, and order into account.  It certainly makes the search better. And that said, you are right that there is much more to do even for Google.
Natural language search, for me, means the ability to use natural language to do anything you would want to do with a search engine. This includes informational, navigational, and transactional queries. It also includes statements of intent and context.
Moving beyond just document search, including ideas you suggest, is highly relevant.
</description>
		<content:encoded><![CDATA[<p>Patrick: You are right that Google is now using specific passage retrieval algorithms (ala Bill Woods) that take word proximity, variation, and order into account.  It certainly makes the search better. And that said, you are right that there is much more to do even for Google.<br />
Natural language search, for me, means the ability to use natural language to do anything you would want to do with a search engine. This includes informational, navigational, and transactional queries. It also includes statements of intent and context.<br />
Moving beyond just document search, including ideas you suggest, is highly relevant.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Patrick Herron</title>
		<link>http://www.barneypell.com/2006/10/powerset-and-natural-language-search/comment-page-1/#comment-41</link>
		<dc:creator>Patrick Herron</dc:creator>
		<pubDate>Fri, 10 Nov 2006 00:40:47 +0000</pubDate>
		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=76#comment-41</guid>
		<description>Without futzing around we might safely (though possibly wrongly) assume the keyword-ese search engine not mentioned by name here is Google.  Google might be the right commercial standard, so for the purposes of conversation I&#039;ll take the risk of deluding myself and assume you really mean Google.
Anyway, this notion of keywordese where Google strips stop words (function words really) is somewhat strawlike, for a lack of a better word (or a lack of a word altogether).  A simple test of Google suffices.  If you compare search results for &quot;a who&quot; and &quot;the who&quot; you will notice the results are different.  Same too when I tried &quot;a mule&quot; and &quot;the mule.&quot;  If the functional word didn&#039;t matter, then they would be stripped and mule would be the query inside the black box.  But that&#039;s not the case, so Google does not really fit the description.  But what other search engines truly matter today, assuming a mass-market audience?&quot;
Google&#039;s search appears to expand the query by what I might call both functional ambiguity and boolean ambiguity as well as proximity-based weightings.  Further, the terms in the title tags are more important than terms in other places.  in this sense, Google already appears to consider functional words, collocation/proximity, and even document context as well.  So by your definition of natural language search it seems that Google is already there.
But it isn&#039;t, obviously.
The value of Google&#039;s search is based almost entirely on the search/searcher context.  If the searcher has a weakly formed preconception of a satisfactory search result (namely because the person is not a domain expert) and the search string is dumb (1-2 words) chances are the search results will be satisfactory--given PageRank over a large enough network.  But if you begin pumping up user domain expertise then the language of search query will necessarily become richer while the user&#039;s standards for satisfaction will rise.  Finally, if the user is a domain expert s/he will likely already know the right subnetworks to stick to.  Google frankly begins to suffer in such an environment as I witnessed first-hand in a study I designed for a large statistical software company.  People who rely on their support search either like Google users and flail or they search with less keywordese-like language (longer query strings, more functional words) and experience greater success.  Several search engines including Google were tested.
I remain befuddled by the phrase &quot;natural language search.&quot;  If I had to imagine what is meant by natural language I would imagine certain things to be present that are typically absent: (1) user context modeling; (2) semantic network expansion (plug in WordNet or something like it and expand your input).
But efforts at disambiguation will diminsih naturalness for reasons too extensive to discuss here.  Even POS tagging strips out POS ambiguities which are rich in the english language.  So grappling with ambiguity using an estimate of likely semantic preferences for specific users might get you to naturalness.  But then you might also need recognizing, making, and refining mistakes.  Or an ability to recognize the differences in meaning of a term over time.
What I wonder even more about is, why stick with search?  Search seems to be a marriage of keywordese-in to document-out.  information extraction at least breaks the document barrier but it still ends up tied to the document.  if you&#039;ge going to have enough computing power under the hood to have natural language search queries (what I imagine that means) and rewrite/expand queries you might as well go for intelligently processing results into a coherent natural language whole.   How about inductive reasoning (e.g., Muggleton&#039;s work) or multidocument summarization approaches?  Something synthetic, something that generates new information from the din of the old, something far more valuable and compact.  Of course that&#039;s easier for me to say than to do however.
I&#039;m eager to see this search engine.  Good luck.
</description>
		<content:encoded><![CDATA[<p>Without futzing around we might safely (though possibly wrongly) assume the keyword-ese search engine not mentioned by name here is Google.  Google might be the right commercial standard, so for the purposes of conversation I&#8217;ll take the risk of deluding myself and assume you really mean Google.<br />
Anyway, this notion of keywordese where Google strips stop words (function words really) is somewhat strawlike, for a lack of a better word (or a lack of a word altogether).  A simple test of Google suffices.  If you compare search results for &#8220;a who&#8221; and &#8220;the who&#8221; you will notice the results are different.  Same too when I tried &#8220;a mule&#8221; and &#8220;the mule.&#8221;  If the functional word didn&#8217;t matter, then they would be stripped and mule would be the query inside the black box.  But that&#8217;s not the case, so Google does not really fit the description.  But what other search engines truly matter today, assuming a mass-market audience?&#8221;<br />
Google&#8217;s search appears to expand the query by what I might call both functional ambiguity and boolean ambiguity as well as proximity-based weightings.  Further, the terms in the title tags are more important than terms in other places.  in this sense, Google already appears to consider functional words, collocation/proximity, and even document context as well.  So by your definition of natural language search it seems that Google is already there.<br />
But it isn&#8217;t, obviously.<br />
The value of Google&#8217;s search is based almost entirely on the search/searcher context.  If the searcher has a weakly formed preconception of a satisfactory search result (namely because the person is not a domain expert) and the search string is dumb (1-2 words) chances are the search results will be satisfactory&#8211;given PageRank over a large enough network.  But if you begin pumping up user domain expertise then the language of search query will necessarily become richer while the user&#8217;s standards for satisfaction will rise.  Finally, if the user is a domain expert s/he will likely already know the right subnetworks to stick to.  Google frankly begins to suffer in such an environment as I witnessed first-hand in a study I designed for a large statistical software company.  People who rely on their support search either like Google users and flail or they search with less keywordese-like language (longer query strings, more functional words) and experience greater success.  Several search engines including Google were tested.<br />
I remain befuddled by the phrase &#8220;natural language search.&#8221;  If I had to imagine what is meant by natural language I would imagine certain things to be present that are typically absent: (1) user context modeling; (2) semantic network expansion (plug in WordNet or something like it and expand your input).<br />
But efforts at disambiguation will diminsih naturalness for reasons too extensive to discuss here.  Even POS tagging strips out POS ambiguities which are rich in the english language.  So grappling with ambiguity using an estimate of likely semantic preferences for specific users might get you to naturalness.  But then you might also need recognizing, making, and refining mistakes.  Or an ability to recognize the differences in meaning of a term over time.<br />
What I wonder even more about is, why stick with search?  Search seems to be a marriage of keywordese-in to document-out.  information extraction at least breaks the document barrier but it still ends up tied to the document.  if you&#8217;ge going to have enough computing power under the hood to have natural language search queries (what I imagine that means) and rewrite/expand queries you might as well go for intelligently processing results into a coherent natural language whole.   How about inductive reasoning (e.g., Muggleton&#8217;s work) or multidocument summarization approaches?  Something synthetic, something that generates new information from the din of the old, something far more valuable and compact.  Of course that&#8217;s easier for me to say than to do however.<br />
I&#8217;m eager to see this search engine.  Good luck.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: lohit</title>
		<link>http://www.barneypell.com/2006/10/powerset-and-natural-language-search/comment-page-1/#comment-40</link>
		<dc:creator>lohit</dc:creator>
		<pubDate>Fri, 06 Oct 2006 13:35:02 +0000</pubDate>
		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=76#comment-40</guid>
		<description>Good luck with the project.
It would be really interesting to see to what extent NLP are used for search. Having a larger dataset like the web makes its much more challenging. Will there be any beta version fore registered users to try the service? Would be be interested in seeing the results.
</description>
		<content:encoded><![CDATA[<p>Good luck with the project.<br />
It would be really interesting to see to what extent NLP are used for search. Having a larger dataset like the web makes its much more challenging. Will there be any beta version fore registered users to try the service? Would be be interested in seeing the results.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: direwolff</title>
		<link>http://www.barneypell.com/2006/10/powerset-and-natural-language-search/comment-page-1/#comment-39</link>
		<dc:creator>direwolff</dc:creator>
		<pubDate>Fri, 06 Oct 2006 01:20:44 +0000</pubDate>
		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=76#comment-39</guid>
		<description>Also note that Keywordese is just fine when you know exactly what you&#039;re looking for.  If I type &quot;Christophe&#039;s Restaurant Sausalito, CA&quot; into Google, the top result gets me what I want, their contact info.  Search improvements are required when the task is a discovery one, where I&#039;m not exactly sure what I&#039;m looking for so I need something that will understand what I mean.  In this regard, Don Dodge&#039;s recommendation above, to focus on specialized vertical markets is great advice worth considering.
Good luck to you on this Barney, it&#039;s a heck of a play.
</description>
		<content:encoded><![CDATA[<p>Also note that Keywordese is just fine when you know exactly what you&#8217;re looking for.  If I type &#8220;Christophe&#8217;s Restaurant Sausalito, CA&#8221; into Google, the top result gets me what I want, their contact info.  Search improvements are required when the task is a discovery one, where I&#8217;m not exactly sure what I&#8217;m looking for so I need something that will understand what I mean.  In this regard, Don Dodge&#8217;s recommendation above, to focus on specialized vertical markets is great advice worth considering.<br />
Good luck to you on this Barney, it&#8217;s a heck of a play.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: azeem</title>
		<link>http://www.barneypell.com/2006/10/powerset-and-natural-language-search/comment-page-1/#comment-38</link>
		<dc:creator>azeem</dc:creator>
		<pubDate>Thu, 05 Oct 2006 13:33:33 +0000</pubDate>
		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=76#comment-38</guid>
		<description>Barney
Totally with you on this. Really clear exposition of today&#039;s search.
Some points:
1. Average keyword length is drifing up. When I ran marketing at Albert, which did some query disambiguation while heading (but not reaching) a decent NLP, average internet searches were 1.4 words trending towards 2. (This is data looking at 1996 to 1999). That is, I guess, because users are getting more sophisticated, data sets are broader so demanding (and rewarding) longer searches.
According to Yahoo!, average query length has now reached 3.3 words. (See &lt;a href=&quot;http://blogs.zdnet.com/micro-markets/index.php?p=27&quot; rel=&quot;nofollow&quot;&gt;http://blogs.zdnet.com/micro-markets/index.php?p=27&lt;/a&gt; )
So contrary to what Danny O&#039;Sullivan has said, user behaviour can and does change.
2. I think we have fallen into the category of learned helplessness about search. We basically accept what we&#039;re given and take what we can get. The lack of innovation in search--and let&#039;s face it, there hasn&#039;t been any since PageRank--has taught us not to rebel.
aa
</description>
		<content:encoded><![CDATA[<p>Barney<br />
Totally with you on this. Really clear exposition of today&#8217;s search.<br />
Some points:<br />
1. Average keyword length is drifing up. When I ran marketing at Albert, which did some query disambiguation while heading (but not reaching) a decent NLP, average internet searches were 1.4 words trending towards 2. (This is data looking at 1996 to 1999). That is, I guess, because users are getting more sophisticated, data sets are broader so demanding (and rewarding) longer searches.<br />
According to Yahoo!, average query length has now reached 3.3 words. (See <a href="http://blogs.zdnet.com/micro-markets/index.php?p=27" rel="nofollow">http://blogs.zdnet.com/micro-markets/index.php?p=27</a> )<br />
So contrary to what Danny O&#8217;Sullivan has said, user behaviour can and does change.<br />
2. I think we have fallen into the category of learned helplessness about search. We basically accept what we&#8217;re given and take what we can get. The lack of innovation in search&#8211;and let&#8217;s face it, there hasn&#8217;t been any since PageRank&#8211;has taught us not to rebel.<br />
aa</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Don Dodge</title>
		<link>http://www.barneypell.com/2006/10/powerset-and-natural-language-search/comment-page-1/#comment-37</link>
		<dc:creator>Don Dodge</dc:creator>
		<pubDate>Thu, 05 Oct 2006 08:50:48 +0000</pubDate>
		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=76#comment-37</guid>
		<description>Barney, Great post, excellent analysis of the search problem. We may have met years ago when I was director of engineering at AltaVista. We had a series of meetings with the SRI guys about NLP.
My advice is to focus on enterprise search or specialized vertical search markets. Your unique technology will be more appreciated and valued by those customers. If you are really stuck on doing consumer search than I would suggest you try to specialize in News Search, People Search, Medical Search, or some other vertical where NLP power can be an advantage.
Good luck!
Don Dodge
The Next Big Thing
&lt;a href=&quot;http://dondodge.typepad.com/the_next_big_thing/2006/10/powerset_natura.html&quot; rel=&quot;nofollow&quot;&gt;http://dondodge.typepad.com/the_next_big_thing/2006/10/powerset_natura.html&lt;/a&gt;
</description>
		<content:encoded><![CDATA[<p>Barney, Great post, excellent analysis of the search problem. We may have met years ago when I was director of engineering at AltaVista. We had a series of meetings with the SRI guys about NLP.<br />
My advice is to focus on enterprise search or specialized vertical search markets. Your unique technology will be more appreciated and valued by those customers. If you are really stuck on doing consumer search than I would suggest you try to specialize in News Search, People Search, Medical Search, or some other vertical where NLP power can be an advantage.<br />
Good luck!<br />
Don Dodge<br />
The Next Big Thing<br />
<a href="http://dondodge.typepad.com/the_next_big_thing/2006/10/powerset_natura.html" rel="nofollow">http://dondodge.typepad.com/the_next_big_thing/2006/10/powerset_natura.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve Bryant</title>
		<link>http://www.barneypell.com/2006/10/powerset-and-natural-language-search/comment-page-1/#comment-36</link>
		<dc:creator>Steve Bryant</dc:creator>
		<pubDate>Thu, 05 Oct 2006 08:21:45 +0000</pubDate>
		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=76#comment-36</guid>
		<description>Hey Barney, I totally dig what you&#039;re saying. I have much respect for you for giving the search engine game a go. But please don&#039;t mischaracterize what I said, which is that competing with Google involves more than search technology. That&#039;s all. If you look at the history of my posts you&#039;ll see I&#039;m not an unabashed Google lover by any stretch of the imagination.
I wish you all the success in the world.
-Steve
p.s. I would take issue with &quot;users don&#039;t like typing and will not enter more than 2-3 words.&quot; Most non-tech savvy people I know type in godawful long sentences, which then break the engine. Either way though, your point stands.
</description>
		<content:encoded><![CDATA[<p>Hey Barney, I totally dig what you&#8217;re saying. I have much respect for you for giving the search engine game a go. But please don&#8217;t mischaracterize what I said, which is that competing with Google involves more than search technology. That&#8217;s all. If you look at the history of my posts you&#8217;ll see I&#8217;m not an unabashed Google lover by any stretch of the imagination.<br />
I wish you all the success in the world.<br />
-Steve<br />
p.s. I would take issue with &#8220;users don&#8217;t like typing and will not enter more than 2-3 words.&#8221; Most non-tech savvy people I know type in godawful long sentences, which then break the engine. Either way though, your point stands.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: anthropocentric</title>
		<link>http://www.barneypell.com/2006/10/powerset-and-natural-language-search/comment-page-1/#comment-35</link>
		<dc:creator>anthropocentric</dc:creator>
		<pubDate>Thu, 05 Oct 2006 01:34:40 +0000</pubDate>
		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=76#comment-35</guid>
		<description>What you are saying makes sense.  It is clear that the search user interface will be improved in the coming years.
Encouraging users to make use of the stopwords and providing better results based on that usage is really interesting:
books by Bradbury about carousels
sales data for Apple iPod
recipe for onion soup
I wonder what proportion of searches will actually benefit from this enhancement?  In many cases, the inclusion of the stopword and the added meaning that it brings is just not helpful (example: &quot;recipe for onion soup&quot; vs. &quot;recipe onion soup&quot;).
Is this a great new feature to incrementally improve the best search engines or is this truly a paridigm shift and a new type of search engine?
</description>
		<content:encoded><![CDATA[<p>What you are saying makes sense.  It is clear that the search user interface will be improved in the coming years.<br />
Encouraging users to make use of the stopwords and providing better results based on that usage is really interesting:<br />
books by Bradbury about carousels<br />
sales data for Apple iPod<br />
recipe for onion soup<br />
I wonder what proportion of searches will actually benefit from this enhancement?  In many cases, the inclusion of the stopword and the added meaning that it brings is just not helpful (example: &#8220;recipe for onion soup&#8221; vs. &#8220;recipe onion soup&#8221;).<br />
Is this a great new feature to incrementally improve the best search engines or is this truly a paridigm shift and a new type of search engine?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Served from: www.barneypell.com @ 2012-02-07 06:04:05 -->
