<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Barney Pell&#039;s Weblog &#187; Human Language Technology</title>
	<atom:link href="http://www.barneypell.com/archives/human-language-technology/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.barneypell.com</link>
	<description></description>
	<lastBuildDate>Thu, 17 Dec 2009 09:20:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Wolfram Alpha: A New Kind of Question-Answering System</title>
		<link>http://www.barneypell.com/2009/03/wolfram-alpha-a-new-kind-of-question-answering-system/</link>
		<comments>http://www.barneypell.com/2009/03/wolfram-alpha-a-new-kind-of-question-answering-system/#comments</comments>
		<pubDate>Mon, 23 Mar 2009 22:03:15 +0000</pubDate>
		<dc:creator>Barney</dc:creator>
				<category><![CDATA[Collective Intelligence]]></category>
		<category><![CDATA[Human Language Technology]]></category>
		<category><![CDATA[Information retrieval]]></category>
		<category><![CDATA[Powerset]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Web/Tech]]></category>

		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=124</guid>
		<description><![CDATA[There has been much excitement recently over the upcoming launch of Wolfram Alpha. This is a new question-answering system developed by Stephen Wolfram, inventor of Mathematica, and it is scheduled for a beta launch in May. Wolfram has been providing demos to industry insiders. I haven’t had a demo yet, but I have learned what [...]]]></description>
			<content:encoded><![CDATA[<p>There has been much excitement recently over the upcoming launch of Wolfram Alpha. This is a new question-answering system developed by Stephen Wolfram, inventor of Mathematica, and it is scheduled for a beta launch in May. Wolfram has been providing demos to industry insiders. I haven’t had a demo yet, but I have learned what I could from reading articles by Nova Spivak (“<a href="http://www.techcrunch.com/2009/03/08/wolfram-alpha-computes-answers-to-factual-questions-this-is-going-to-be-big/">Wolfram Alpha computes answers to factual questions. This is going to be big”</a>) and Doug Lenat (<a href="http://www.semanticuniverse.com/blogs-i-was-positively-impressed-wolfram-alpha.html">“I was positively impressed with Wolfram Alpha”</a>). And this weekend I spoke with William Tunstall-Pedoe, CEO of <a href="http://www.trueknowledge.com/">True Knowledge</a>, who also got a demo.  Many of my examples and conclusions come from conversation with William (thanks!).  Since life is short and so is the attention of web readers, I&#8217;ll give the rest of my thoughts in bullet form.</p>
<p><strong>What it is: A new kind of question-answering system. </strong></p>
<p><strong>Examples</strong></p>
<ul>
<li> Math: &#8220;2+2&#8243; and then a few simple math questions: &#8220;integrate xsin^4xdx&#8221;, &#8220;what is the square root of 18&#8243; etc.</li>
<li> Business: “gdp france” showed amount and graph of how it changed over time. “gdp france/germany” showed graph with both amounts and the ratio</li>
<li> “internet users in Europe”: Showed total, and a chart of usage by country in Europe, at the current time, specifically highlighting the biggest and smallest</li>
<li> “ISS”: generates a graphic rendition of the international space station orbiting earth and updating in real-time</li>
<li> “tides in san Francisco”: showed a graph of tides over time, where the times were listed in the local time regime current in the late 19th century for those data points. “tide NYC 11/12/1922” gave a single answer.</li>
<li> “weather”: showed graph of average temperature in Cambridge, MA (where Stephen was when doing the demo). Based on reverse IP lookup.</li>
<li> Computational fluid dynamics: typing in the name of a specific aerofoil produced a picture of that aerofoil along with its differential equations.</li>
<li> stock prices:  “MSFT CSCO” showed comparison chart</li>
<li> chemicals: Substances at temperature or pressure, got physical properties calculated. “H2SO4” showed a diagram and chemical properties. &#8220;5 molar h2s04&#8243; did something cool, I don’t know what.</li>
<li> genome sequences: “AGTAG” shows sequences from the human genome that match that pattern</li>
<li> data about people: “How old is Barack Obama” gives his age now. “When was Alan Turing born” gives the answer. “How old is Alan Turing” (a trick question) gives an error message with no human-readable explanation (True Knowledge, by contrast, tells you exactly why this is a trick question).</li>
</ul>
<p><strong>Coverage of data: It answers questions over the following types of structured data:</strong></p>
<ul>
<li> static tables and databases (e.g. a database of internet usage by country by year)</li>
<li> dynamic data feeds (e.g. historical stock market data, position of space shuttle, weather)</li>
<li> numerical inference (e.g. math questions)</li>
<li> numerical computations and simulations (e.g. tides, astronomy, chemistry)</li>
</ul>
<p><span id="more-124"></span></p>
<div id="a000132more">
<div id="more">
<p><strong> Form of queries</strong></p>
<li> The queries are expressed in template-based natural language or corresponding abbreviated forms</li>
<li> NL syntax: “what is the gdp of france”</li>
<li> Template compressed: {attribute} of {object} {time}  (“gdp france 2008”)</li>
<li> Mathematical expressions, or NL versions of these (as one might do in an entry-level LISP class)</li>
<li> I can imagine the query language supports (or could support) restrictions on presentation (plot, chart) and other constraints one might express in SQL (order by, etc), though I haven’t seen any examples showing this exists at present.<strong> Presentation and Answers</strong>
<ul>
<li> Answers can be a single fact, a table, or a graphical display of a live simulation.  Usually it’s a combination of these.</li>
<li> For ambiguous queries, it always picks one interpretation. And you can switch to something else if that’s wrong. (A drop-down menu of other alternatives).</li>
</ul>
<p><strong> Domains and Generality</strong></li>
<li> Wolfram Alpha is described as an open domain question answering system on structured data. But how exactly is this open domain? I distinguish three levels of domain generality:
<ul>
<li> Closed domain: A specified domain</li>
<li> Multi domain: Multiple domains are covered, we try to add more domains, but still treats each one a closed. Note: this can be accomplished through a unified or disjoint treatment.</li>
<li> Open domain: Any domain is within scope</li>
</ul>
</li>
<li>For Wolfram Alpha they have taken a domain-by-domain approach. For each domain, they determined what type of questions to support, and which data, feeds, or simulations to incorporate, and did hand curation to enable these.</li>
<li> The domains are typically fact and data oriented, especially where simulations are available<strong> Architecture</strong></li>
<li> The system is coded in Mathematica, about 4.5M lines of code, developed by a large team (100 people at present).</li>
<li> From this <a href="http://www.wolfram.com/products/mathematica/quickoverview/">presentation on Mathematica </a>it is quite easy to extrapolate what Wolfram Alpha is like &#8211; essentially Mathematica + a vast library of mathematical models and data attached + some error-tolerant processing of the user&#8217;s input (thanks Peter Clark for pointing this out).</li>
<li> Piecing together the Mathematica approach and generalizing from the examples and my own knowledge, I believe they have a basic level of representational tools that gets shared for multiple domains. Here&#8217;s how I would think about this:
<ul>
<li> Define the objects in the domain</li>
<li> Make a table of function names and attributes in the domain, and for each function or attribute list the restrictions on the type of objects that this can apply to.</li>
<li> Standardize representations of time and place and charting elements associated with these.</li>
<li> Import and normalize data</li>
<li> Associate data fields to objects and attributes in the domain</li>
</ul>
<p><strong> Infrastructure</strong></li>
<li> The system runs on thousands of expensive servers (running mathematica in real-time).</li>
<li> Apparently 10 machines per query give 1 queries per second (qps), so they can do 100 qps on 1,000 machines.<strong> What is innovative about this</strong></li>
<li> Rich mathematical computational infrastructure (Mathematica) to support mathematical aspects of natural language queries</li>
<li> Integration of mathematical inference and simulations along with structured data in a single question-answering system</li>
<li> Unprecedented level of structured data aggregation and curation</li>
<li> Rich presentation including static and dynamic elements and multiple modalities</li>
<li> (Potentially) Deployment of NL-to-SQL query translation in a multi-domain system. The technology has existed to do this for several years But I don’t know if anyone has deployed it yet. I’m not sure if Wolfram has deployed this and haven’t seen enough examples to indicate if they have.<strong> What it doesn’t do</strong></li>
<li> Queries or presentation against unstructured data (neither keyword nor NL queries against unstructured data, which is a strength of <a href="http://www.powerset.com/">Powerset</a>)</li>
<li> Queries requiring ontological or commonsense inference (whether structured or unstructured, which is a strength of True Knowledge and <a href="http://www.cyc.com/">Cyc</a>)</li>
<li> Answers in support of transactions (e.g. price feeds from many merchants or airlines), which is shown in various stages in many major search engines</li>
<li> Cross-domain multiple domains (e.g. “what was the weather in San Francisco when Yahoo was founded”, which is a strength of True Knowledge)<strong> Implications for the field</strong>
<ul>
<li> Question answering has been an important part of search results the whole time, but it has often been a second class citizen and hardly promoted</li>
<li> By increasing the level of comprehensiveness of structured questions (in terms of data and domains), this can increase awareness and usage of question answering systems</li>
<li> This should move question answering to be more of a competitive feature across search engines</li>
<li> Users will want to ask questions for structured and unstructured queries, not just structured queries, which will increase perceived differentiation for technology like Powerset</li>
<li> If the use of structured data and simulations prove valuable to large number of users and search engines, then this will increase the need to transform and route queries to vertical experts, potentially developed by ecosystem partners</li>
<li> This will increase the need and value for ecosystem players to add semantic markup to their structured data and simulations, hence making it easier to offer more semantic question answering and integration with other services, and expanding the value of the services by search engines in a virtuous cycle</li>
</ul>
<p><strong>Conclusion</strong></p>
<p>In conclusion, Wolfram Alpha is not going to be a new search engine or a universal answer engine. It is not going to put the existing major players or semantic search startups out of business. But there appears to be real innovation here, leading to at least a <span style="text-decoration: underline;">new kind of system</span> that we have not seen before.  I am eagerly looking forward to my turn to try it out.</li>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.barneypell.com/2009/03/wolfram-alpha-a-new-kind-of-question-answering-system/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Microsoft to acquire Powerset</title>
		<link>http://www.barneypell.com/2008/07/microsoft-to-acquire-powerset/</link>
		<comments>http://www.barneypell.com/2008/07/microsoft-to-acquire-powerset/#comments</comments>
		<pubDate>Thu, 03 Jul 2008 15:50:32 +0000</pubDate>
		<dc:creator>Barney</dc:creator>
				<category><![CDATA[Human Language Technology]]></category>
		<category><![CDATA[Information retrieval]]></category>
		<category><![CDATA[Powerset]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=118</guid>
		<description><![CDATA[On Monday, Microsoft and Powerset announced that Powerset is being acquired by Microsoft. In terms of timing, the companies announced that the deal was signed. There is still the customary period before the deal is officially closed (at which point, I expect we&#8217;re going to have a great party). I&#8217;m including, below, the text of [...]]]></description>
			<content:encoded><![CDATA[<p>On Monday, Microsoft and Powerset announced that Powerset is being acquired by Microsoft.</p>
<p>In terms of timing, the companies announced that the deal was signed. There is still the customary period before the deal is officially closed (at which point, I expect we&#8217;re going to have a great party).</p>
<p>I&#8217;m including, below, the text of the announcements from the blogs of Powerset andMicrosoft.<br />
I think these sum up pretty well the logic behind the acquisition on both sides.</p>
<p>It took a lot of work by many people to make this happen. Most significant, of course, was the entire team at Powerset, who executed so well to build and launch a wonderful product that showed the world what is now possible.</p>
<p>Immediately following the announcement, we had a day of calls with members of the press, which resulted in a lot of coverage. I&#8217;ll try to post a collection of links next week.</p>
<p>One press meeting that I really enjoyed was a <a href="http://www.techcrunch.com/2008/07/02/interview-with-barney-pell-and-ramez-naam-about-microsoft%e2%80%99s-powerset-acquisition-integration-to-begin-this-year/">podcast with me, Ramez Naam (Group Program Manager for Microsoft Live Search), and Mike Arrington for TechCrunch</a>.  That link provides an article, transcript, and the full audio of the interview.</p>
<p>There is a lot more to say about Powerset, Microsoft, the acquisition, and what it means for the future of search, linguistic technology, semantic web, etc. I am excited to be staying on with Microsoft in a strategy and evangelist role and I am looking forward to the chance to talk and write a lot more about this, and from a whole new perspective, soon.</p>
<p>Here is the text of <a href="http://www.powerset.com/blog/articles/2008/07/01/microsoft-to-acquire-powerset">Powerset&#8217;s blog announcement</a>:</p>
<blockquote><p>We’re excited to announce officially that Microsoft has signed an agreement to acquire Powerset.Powerset has always been a small company with big dreams, with the ultimate goal of changing the way humans interact with computers through language. We set out to improve search by indexing Web pages based on the meaning expressed in them rather than just the literal words. Powerset licensed breakthrough technology from PARC, hired world-renowned computational linguists and search engineers, and recently released a search and discovery experience for Wikipedia articles. Our technology helps to improve search results and also makes new features possible, such as Factz, which aggregates information from many articles to summarize a topic.</p>
<p>With any startup, the challenge is to take the seeds of an idea and grow it into a viable company. At Powerset, we transformed our idea into a world-class semantic search platform, demonstrating the future of search with our Wikipedia search experience. But building a large-scale semantic search engine is expensive, requiring an engineering effort and computing resources beyond what most start-ups could ever imagine. Because our goals around improving search align so well, Powerset has decided to team up with Microsoft. We believe that this is the fastest way to bring our technology to market at a large scale.</p>
<p>Microsoft shares our goal to improve search through deeper analysis of queries and documents, and understands that our technology and expertise will play a key role in the evolution of search. With an existing search infrastructure, incredible capital resources, unlimited data, a leading search team, and clear mission to revolutionize the search landscape, Microsoft can rapidly accelerate our progress in building semantic search technology and bringing it to full Web scale. When we launched our first product, we heard: this is great, but when and how will we get Powerset to go beyond Wikpiedia? Microsoft accelerates our ability to move Powerset to the entire Web faster than anyone could have imagined.</p>
<p>Powerset will continue to operate much as we currently do, working in the same building, with the same organizational structure, and with the same uniquely talented and growing team (apply on our jobs page). We’ll continue to tackle the hardest problems in parsing, semantics, ranking, indexing, scalable computing, user experience and all of our other specialties. But now we’ll do it with the support of Microsoft and the vast resources of the entire Live Search team.</p>
<p>Over the past couple of years Powerset has made amazing progress. Starting with just a big idea, we licensed the best linguistic technology, recruited a top-notch team, built out our datacenter, engineered a world-class semantic search platform, tackled deep natural language issues, improved relevance, innovated an interface and launched a great product. So few start-ups ever tackle such deep, scientific problems successfully and create the kind of value we’ve delivered in such short order.</p>
<p>For now, Powerset.com will continue to host our Wikipedia Search &amp; Discovery and we’ll be continuing to experiment with our product, based on user feedback. But, expect many announcements from us in the coming months about how we’re integrating our technology and features into Live Search.</p></blockquote>
<p>And here&#8217;s the text of <a href="http://blogs.msdn.com/livesearch/archive/2008/07/01/powerset-joins-live-search.aspx">Microsoft&#8217;s blog announcement</a>:</p>
<blockquote><p>Powerset joins Live SearchWe&#8217;re excited to announce that we&#8217;ve reached an agreement to acquire Powerset, a San Francisco-based search and natural language company.</p>
<p>Powerset will join our core Search Relevance team, remaining intact in San Francisco. Powerset brings with it natural language technology that nicely complements other natural language processing technologies we have in Microsoft Research.</p>
<p>More importantly, Powerset brings to Live Search a set of talented engineers and computational linguists in downtown San Francisco. This is a great team with a wide range of experience from other search engines and research organizations like PARC (formerly Xerox PARC).</p>
<p>We&#8217;re buying Powerset first and foremost because we&#8217;re impressed with the people there. Powerset CTO and cofounder Barney Pell is a visionary and incredible evangelist. When he introduced our senior engineers to some of the most senior people at Powerset — Search engineers and computational linguists like Tim Converse, Chad Walters, Scott Prevost, Lorenzo Thione, and Ron Kaplan — we came away impressed by their smarts, their experience, their passion for search, and a shared vision.</p>
<p>That shared vision is to take Search to the next level by adding understanding of the intent and meaning behind the words in searches and webpages.</p>
<p>We know today that roughly a third of searches don&#8217;t get answered on the first search and first click. Usually searchers find the information they want eventually, but that often requires multiple searches or clicks on multiple search results. Two specific problems are the most common reasons for this:</p>
<p>* Differences in phrasing or context between a user&#8217;s search and the way the same information is expressed on webpages. Search engines don&#8217;t understand today that &#8220;shrub&#8221; and &#8220;tree&#8221; are similar concepts. We don&#8217;t understand that &#8220;cancer&#8221; sometimes refers to a disease and sometimes refers to a horoscope and when a query or a webpage refers to which.<br />
* Lack of clarity in the descriptions for each webpage in the search results. Sometimes a result looks relevant from its short description on the results page but turns out to be not so relevant when you visit the actual page. As a result, searchers frequently click results and then rapidly click back when they realize they aren&#8217;t what they&#8217;re looking for.</p>
<p>These problems exist because search engines today primarily match words in a search to words on a webpage. We can solve these problems by working to understand the intent behind each search and the concepts and meaning embedded in a webpage. Doing so, we can innovate in the quality of the search results, in the flexibility with which searchers can phrase their queries, and in the search user experience. We will use knowledge extracted from webpages to improve the result descriptions and provide new tools to help customers search better.</p>
<p>Working with our existing Search team and other Microsoft teams that focus on natural language, Powerset will help us address all of those problems and opportunities.</p>
<p>We&#8217;re looking to add even more talented engineers to the San Francisco team to accelerate our shared progress. If you&#8217;re interested in joining the team, drop us a line.</p>
<p>We&#8217;ll have more to say about the things we&#8217;re doing in understanding searches and webpages through natural language technology in the coming months. In the meantime, please join me in welcoming Powerset to Microsoft!</p>
<p>Satya Nadella, Senior Vice President, Search, Portal, and Advertising</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.barneypell.com/2008/07/microsoft-to-acquire-powerset/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>In 5 years we will search more with voice than typing</title>
		<link>http://www.barneypell.com/2008/02/in-5-years-we-will-search-more-with-voice-than-typing/</link>
		<comments>http://www.barneypell.com/2008/02/in-5-years-we-will-search-more-with-voice-than-typing/#comments</comments>
		<pubDate>Tue, 26 Feb 2008 20:30:25 +0000</pubDate>
		<dc:creator>Barney</dc:creator>
				<category><![CDATA[Human Language Technology]]></category>
		<category><![CDATA[Powerset]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=111</guid>
		<description><![CDATA[David Vogelpohl wrote an article, Will Microsoft Resurrect Natural Language Search, citing a recent AP article about Bill Gates and voice-based search. Here are some quotes from the AP article: People will increasingly interact with computers using speech or touch screens rather than keyboards, Microsoft Corp. Chairman Bill Gates said.“It’s one of the big bets [...]]]></description>
			<content:encoded><![CDATA[<p>David Vogelpohl wrote an article, <a href="http://www.marketingpilgrim.com/2008/02/will-microsoft-resurrect-natural-language-search.html">Will Microsoft Resurrect Natural Language Search</a>, citing a recent <a href="http://biz.yahoo.com/ap/080222/gates_goodbye_keyboards.html?.v=2">AP article</a> about Bill Gates and voice-based search.  Here are some quotes from the AP article:</p>
<blockquote><p>People will increasingly interact with computers using speech or touch screens rather than keyboards, Microsoft Corp. Chairman Bill Gates said.“It’s one of the big bets we’re making,” he said during the final stop of a farewell tour before he withdraws from the company’s daily operations in July.</p>
<p>In five years, Microsoft expects more Internet searches to be done through speech than through typing on a keyboard, Gates told about 1,200 students and faculty members Thursday at Carnegie Mellon University.</p></blockquote>
<p>David conjectures, as do I, that when people speak their searches they are more likely to use natural language than to use keywordese, and that this could change the game in search.</p>
<blockquote><p>I personally can envision Microsoft trying to integrate speech based data entry as closely as possible with our normal style of speaking. Perhaps the phrase “Where can I buy a hd tv?” would be more natural for searchers when you take away the limitations of the keyboard.Wide spread speech based data entry will almost certainly impact the way Microsoft and subsequently all other search engines deal with search queries.</p></blockquote>
<p>It&#8217;s interesting to see Bill Gates predicting this to happen within 5 years. In the blink of an eye, an entire industry is going to change dramatically.</p>
<p>While on the topic of predictions about voice and language, here&#8217;s one of my predictions that I have been meaning to write up:</p>
<blockquote><p>Within 8 years from now (2016), every category of consumer electronics will have some linguistic interface as a standard feature.</p></blockquote>
<p>By &#8220;linguistic interface&#8221;, I mean voice interactions or text-based interaction that is linguage-based. Not that these devices won&#8217;t still have nonlinguistic interfaces too (e.g. there will still be buttons, most likely). And by &#8220;every category&#8221;, I mean you will not find a category of consumer electronics that does not have some product in that category with that feature.</p>
<p>For example, users will expect to be able to talk to cameras, tvs, stereos, ipods, phones, watches, microwave ovens, refrigerators, cars, etc. There will still be some cameras that aren&#8217;t language-enabled, but every category will have some products that are.</p>
<p>As my friends Cliff Nass and Scott Brave write in their book, <a href="http://www.amazon.co.uk/Voice-Activated-Psychology-Interfaces-Wirelesses/dp/1575863324">Voice Activated</a>, when people interact with devices using voice, it also invokes the rest of their social apparatus. You can&#8217;t hear a voice without ascribing some kind of personality, gender, race, social status, etc to the source of the voice. So in addition to expecting linguistic capability, we&#8217;re also going to start expecting personality within the next decade.</p>
<p>I&#8217;ll stop here before I get carried away to the singularity&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.barneypell.com/2008/02/in-5-years-we-will-search-more-with-voice-than-typing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Powerset in Forbes article on the Language of Search</title>
		<link>http://www.barneypell.com/2008/02/powerset-in-forbes-article-on-the-language-of-search/</link>
		<comments>http://www.barneypell.com/2008/02/powerset-in-forbes-article-on-the-language-of-search/#comments</comments>
		<pubDate>Mon, 25 Feb 2008 00:16:54 +0000</pubDate>
		<dc:creator>Barney</dc:creator>
				<category><![CDATA[Human Language Technology]]></category>
		<category><![CDATA[Information retrieval]]></category>
		<category><![CDATA[Powerset]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=109</guid>
		<description><![CDATA[Forbes.com has a special issue on language, including interesting articles and interviews by some of my favorite writers on Language. I&#8217;m happy that natural language and semantic search was included in the special issue. Andy Greenberg from Forbes.com published his piece on language and search engines devoting a good portion of the article to Powerset [...]]]></description>
			<content:encoded><![CDATA[<p>Forbes.com has a special issue on language, including interesting articles and interviews by some of my favorite writers on Language.</p>
<p>I&#8217;m happy that natural language and semantic search was included in the special issue. Andy Greenberg from Forbes.com published his piece on language and search engines devoting a good portion of the article to <a href="http://www.powerset.com/">Powerset</a> and <a href="http://www.hakia.com/">Hakia</a>, featuring interviews with me and with Hakia&#8217;s founder Riza Berkan. The article, entitled <a href="http://www.forbes.com/business/2008/02/21/search-engine-semantic-tech-cx_ag_language_sp08_0221hakia.html">&#8220;Language Web-lish&#8221;</a> starts off with Andy using Powerset&#8217;s metaphor comparing people&#8217;s current use of search engines to communicating like cavemen:</p>
<blockquote><p>A question in English, like &#8220;What year was Hillary Clinton born?&#8221; becomes what he calls a primitive &#8220;keywordese&#8221;: &#8220;Hillary Clinton born year.&#8221;"We have this great gift of human intelligence based around language,&#8221; says Pell, &#8220;and now we have to translate it into a grunting pidgin language to interact with machines.&#8221;</p></blockquote>
<p>Andy described an example I showed him from Powerset:</p>
<blockquote><p>When a user enters the question, &#8220;In what year was Hillary Clinton born?,&#8221; Powerset&#8217;s algorithm doesn&#8217;t simply scour the Web for this collection of words in close proximity. Instead, it looks at pages with an eye for their meaning. Reading the sentence &#8220;Born to Dorothy and Hugh Rodham in 1947, Hillary Clinton is a New York senator,&#8221; Powerset will disassemble the sentence&#8217;s grammar and extract the fact of Hillary Clinton&#8217;s birth date. That fact is then connected with the user&#8217;s question, even if the word order of the result and the query didn&#8217;t originally match.</p></blockquote>
<p>Andy also went through an example from Hakia:</p>
<blockquote><p>Taking the question &#8220;What drug is best for treating a urinary tract infection?&#8221; Riza Berkan points to the word &#8220;drug.&#8221; Hakia&#8217;s algorithm, he says, understands that the word contains a massive subset of concepts including synonyms and specific names of medicines. When it spots a term that falls into that subset, like &#8220;Amoxicillin,&#8221; Hakia can substitute the medicine&#8217;s name for the word &#8220;drug&#8221; in the result.&#8221;You don&#8217;t want the word &#8216;drug,&#8217; you want the name of the drug,&#8221; says Berkan. &#8220;That&#8217;s a hidden failure in search engines, and people don&#8217;t even know what they&#8217;re missing.&#8221;</p></blockquote>
<p>Other natural language and semantic search companies mentioned included <a href="http://www.cognitionsearch.com/">Cognition Search</a> and <a href="http://www.lexxe.com/">Lexxe</a>.</p>
<p>As is typical, my friend Peter Norvig at Google gets the last word in the article:</p>
<blockquote><p>Google&#8217;s Peter Norvig, the search giant&#8217;s director of research, knows just how complex semantic algorithms can be: His Berkeley Ph.D. thesis tried to develop one in 1978. Every sentence of text, he says, took weeks to analyze. &#8220;The result was kind of like a dancing bear,&#8221; he says. &#8220;It was amazing that it could dance at all, but we didn&#8217;t expect it to star in the Moscow Ballet.&#8221;But that doesn&#8217;t mean Google&#8217;s engineers are idly watching semantic search from a distance, says Norvig. The company&#8217;s thousands of engineers are looking at how to incorporate semantic analysis into a search algorithm. But semantic analysis is just one of many directions that Google&#8217;s teams are exploring&#8230; &#8220;Basically, we just do whatever works,&#8221; says Norvig. &#8220;Instead of trying to understand everything, we&#8217;re trying to understand something about billions of pages a week.&#8221;</p>
<p>But does that pragmatic approach leave Google vulnerable to an innovative start-up willing to risk its fate on building meaning-based search from scratch?</p>
<p>&#8220;It&#8217;s unlikely,&#8221; says Norvig. &#8220;But even car companies have to worry about anti-gravity machines.&#8221;</p></blockquote>
<p>I think that analogy is quite a stretch. It&#8217;s more like big car companies having to worry about smaller companies focused on electric cars. They don&#8217;t have to worry about this immediately but, at some point, this is going to be the future of their industry.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.barneypell.com/2008/02/powerset-in-forbes-article-on-the-language-of-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Natural Language and the Semantic Web: ISWC Keynote talk</title>
		<link>http://www.barneypell.com/2007/11/natural-language-and-the-semantic-web-iswc-keynote-talk/</link>
		<comments>http://www.barneypell.com/2007/11/natural-language-and-the-semantic-web-iswc-keynote-talk/#comments</comments>
		<pubDate>Mon, 19 Nov 2007 20:29:54 +0000</pubDate>
		<dc:creator>Barney</dc:creator>
				<category><![CDATA[Collective Intelligence]]></category>
		<category><![CDATA[Human Language Technology]]></category>
		<category><![CDATA[Powerset]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[ISWC07]]></category>
		<category><![CDATA[Korea]]></category>
		<category><![CDATA[natural language]]></category>
		<category><![CDATA[semantic web]]></category>

		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=102</guid>
		<description><![CDATA[I gave an invited keynote talk last week at The 6th International Semantic Web Conference and the 2nd Asian Semantic Web Conference, 2007. The abstract for the talk is below. The image below links to the original video and presentation slides. The live presentation (and video) contains technical demos that aren&#8217;t in the slides. Some [...]]]></description>
			<content:encoded><![CDATA[<p>I gave an invited keynote talk last week at <a href='http://videolectures.net/iswc07_busan/'>The 6th International Semantic Web Conference and the 2nd Asian Semantic Web Conference, 2007</a>.  The abstract for the talk is below.  The image below links to the original video and presentation slides.</p>
<p>The live presentation (and video) contains technical demos that aren&#8217;t in the slides.  Some of the demos are already available inside <a href="http://labs.powerset.com">Powerlabs</a> (e.g. Powermouse, which lets you browse and query our semantic database of facts extracted from Wikipedia), while some of these are still internal (e.g. an open search box, and output of our natural language system on full sentences).  I also gave some detailed walk-through showing how Powerset takes advantage of external semantic resources like <a href="http://wordnet.princeton.edu/">Wordnet</a> and <a href="http://www.Freebase.com">Freebase</a>.</p>
<p>For me, the most fun part of the talk was toward the end, where I got to speculate on how ecosystem effects can make natural language search and the semantic web become deeper and more powerful more quickly than people might expect. For example, advertisers, publishers, and vertical search sites will be able to contribute ontologies that enable them to get more users, better internal search, and more revenue, while having as a side effect that the broad search engines get more knowledgeable about different domains.<br />
The questions afterward were also challenging and interesting.<br />
<a href='http://videolectures.net/iswc07_pell_nlpsw/'><br />
<img src='http://videolectures.net/iswc07_pell_nlpsw/thumb.jpg' border=0/><br />
<br/>POWERSET &#8211; Natural Language and the Semantic Web</a><br/></p>
<p><span id="more-102"></span><br />
The Semantic Web promises to revolutionize access to information by adding machine-readable semantic information to content which is normally interpretable only by people. In addition, it will also revolutionize access to services by adding semantic information to create machine-readable service descriptions. This ambitious vision has been slow to take off because of a chickenand egg problem. Markup is required before people will build applications, applications are required before it is worth the hard work of doing markup. Natural language processing (NLP) has advanced to the point where it can break the impasse and open up the possibilities of the Semantic Web. First, NLP systems can now automatically create annotations from unstructured text. This provides the data that semantic web applications require. Second, NLP systems are themselves consumers of semantic web information and thus provide economic motivation for people to create and maintain such information. For example, a new generation of natural language search systems, as illustrated by Powerset, can take advantage of semantic web markup and ontologies to augment their interpretation of underlying textual content. They can also expose semantic web services directly in response to natural language queries.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.barneypell.com/2007/11/natural-language-and-the-semantic-web-iswc-keynote-talk/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tim Converse on Proximity is a Hack</title>
		<link>http://www.barneypell.com/2007/09/tim-converse-on-proximity-is-a-hack/</link>
		<comments>http://www.barneypell.com/2007/09/tim-converse-on-proximity-is-a-hack/#comments</comments>
		<pubDate>Wed, 12 Sep 2007 21:03:28 +0000</pubDate>
		<dc:creator>Barney</dc:creator>
				<category><![CDATA[Human Language Technology]]></category>
		<category><![CDATA[Powerset]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[natural language search]]></category>
		<category><![CDATA[term proximity]]></category>

		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=100</guid>
		<description><![CDATA[Powerset&#8217;s Tim Converse wrote a great article entitled: Proximity is a Hack. In the article, Tim says that the two biggest improvements in web search were the use of links (including anchor text) and term proximity. The article explores the benefits of term proximity and argues that works to the extent that it approximates linguistic [...]]]></description>
			<content:encoded><![CDATA[<p>Powerset&#8217;s <a href="http://timconverse.wordpress.com">Tim Converse </a>wrote a great article entitled: <a href="http://timconverse.wordpress.com/2007/06/25/proximity-is-a-hack/">Proximity is a Hack</a>.<br />
In the article, Tim says that the two biggest improvements in web search were the use of links (including anchor text) and term proximity. The article explores the benefits of term proximity and argues that works to the extent that it approximates linguistic relationships in the text.<br />
He concludes that natural language processing of the documents should have the ability to more accurately capture linguistic relationships even if the query itself is in keywordese (as opposed to a natural language query with internal linguistic structure).</p>
<blockquote><p>
To recap: proximity is both a wonderfully powerful relevance feature, and a total hack. It helps enormously, but it’s not what you really want, it’s just sorta somewhat correlated with what you really want. What you need for what you really want is the underlying structure of all that web content: the real syntactic structure of the sentences, how the sentences connect to each other, how the facts relate, and (maybe) how the discourse flows and the topics connect. We’ve squeezed all the juice we can out of webpages considered as word-vectors; now it’s time to parse this stuff and get at the real structure.<br />
Can that be done? A couple of years ago I would have said no, but I hadn’t seen the PARC natural language technology then, and didn’t know that an effort this concerted and well-funded was on the way. Now, do I think that Powerset will do it? I still don’t know, frankly &#8211; there’s so much more to do to make it real and debugged and scaled the way it needs to be. But it’s clear to me that the next big thing in web search is either this or something a whole lot like this, and I think we have the best shot of anyone. And that’s why I’m at Powerset.   </p></blockquote>
<p>The article is definitely good reading for people interested in search and the potential benefits of NLP.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.barneypell.com/2007/09/tim-converse-on-proximity-is-a-hack/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Technology Review on Building a Better Search Engine</title>
		<link>http://www.barneypell.com/2007/08/technology-review-on-building-a-better-search-engine/</link>
		<comments>http://www.barneypell.com/2007/08/technology-review-on-building-a-better-search-engine/#comments</comments>
		<pubDate>Thu, 09 Aug 2007 20:21:43 +0000</pubDate>
		<dc:creator>Barney</dc:creator>
				<category><![CDATA[Human Language Technology]]></category>
		<category><![CDATA[Powerset]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=95</guid>
		<description><![CDATA[Technology Review recently had an article featuring Powerset: Building a Better Search Engine, by Michael Reisman. In addition to Powerset, the article also mentions Hakia and Cognition Search, and closes with a discussion of a semantic search project inside IBM. I am an avid reader of Technology Review and am really excited to have an [...]]]></description>
			<content:encoded><![CDATA[<p>Technology Review recently had an article featuring Powerset: <a href="http://www.technologyreview.com/read_article.aspx?id=19109&#038;a=f">Building a Better Search Engine</a>, by Michael Reisman.<br />
In addition to Powerset, the article also mentions Hakia and Cognition Search, and closes with a discussion of a semantic search project inside IBM.</p>
<p>I am an avid reader of Technology Review and am really excited to have an article about us in this great publication.</p>
<p>The full article is worth reading.</p>
<p>Here are just a few excerpts about natural language search and Powerset&#8217;s technology:</p>
<blockquote><p>The company claims that the engine finds the best answer by considering the meaning and context of the question and related Web pages.<br />
&#8220;Powerset extracts deep concepts and relationships from the texts, and the users query and match them efficiently to deliver a better search,&#8221; Powerset CEO Barney Pell says.</p>
<p>Powerset chief technology officer Ron Kaplan has led PARC&#8217;s XLE team since the 1970s and is the author of much of the technology behind XLE that has been licensed to the company. Kaplan says that he and Pell began to collaborate on the idea about two years ago.<br />
Current methods of searching used by more traditional engines focus on isolated keywords and broad but shallow content coverage. This leaves a lot of room for improvement, Kaplan says.</p>
<p>&#8220;They are really not getting at relationships,&#8221; he notes. &#8220;The best that they do to approximate relationships are words that are close to other words.&#8221; He adds that a much deeper level of analysis is required.</p></blockquote>
<p>The article came in time to announce the upcoming launch of <a href="http://labs.powerset.com">Powerlabs</a>, our early user community:</p>
<blockquote><p>
The company plans to release demo versions of the search engine on its Powerlabs website, where consumers can test-drive the product beginning in September. User feedback will be taken into consideration as Powerset makes the final product, which is slated for release next year.</p>
<p>&#8220;The key challenge is to get the system to the point where people can understand how to use it and get real value out of these systems even though they are not perfect,&#8221; Pell says. &#8220;We are finally at the point where we are going to cross that threshold.&#8221;
</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.barneypell.com/2007/08/technology-review-on-building-a-better-search-engine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>When is Natural Language useful?</title>
		<link>http://www.barneypell.com/2007/04/when-is-natural-language-useful/</link>
		<comments>http://www.barneypell.com/2007/04/when-is-natural-language-useful/#comments</comments>
		<pubDate>Sat, 07 Apr 2007 20:07:38 +0000</pubDate>
		<dc:creator>Barney</dc:creator>
				<category><![CDATA[Human Language Technology]]></category>
		<category><![CDATA[Powerset]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=90</guid>
		<description><![CDATA[In talking about Powerset and natural language search, I am frequently asked &#8220;When is Natural Language search useful?&#8221;. The idea here is that maybe there are some specific situations where you really want natural language search. My general response is that this is like asking &#8220;When is Natural Language useful?&#8221; to talk to other people? [...]]]></description>
			<content:encoded><![CDATA[<p>In talking about Powerset and natural language search, I am frequently asked &#8220;When is Natural Language search useful?&#8221;.  The idea here is that maybe there are some specific situations where you really want natural language search.  My general response is that this is like asking &#8220;When is Natural Language useful?&#8221; to talk to other people?  The very question assumes that there are some particular situations where you want to use natural language, and others where you would prefer to just grunt out a few words.<br />
I am aware of a number of situations where it is really clear that you want the power and usability of natural language for search, including:</p>
<p><span id="more-90"></span></p>
<ul>
<li>alerts
<li>mobile search
<li>research
<li>needle-in-haystack search
<li>cross-lingual search
</ul>
<p>And I see a number of user segments where it is especially attractive:</p>
<ul>
<li>power searchers want the power of natural language
<li>novice searchers want the usability of natural language
<li>kids haven&#8217;t learned how to speak keywordese yet
<li>seniors don&#8217;t want to learn how to speak keywordese at all
</ul>
<p>Note the pattern here: young and old, novice and experts. It would be surprising if natural language appealed to people at the extremes but somehow was less useful for people in the middle.<br />
But in a quest for data, the question arises &#8220;What percent of queries to search engines today are in natural language?&#8221;, with the related follow up &#8220;Is there a subset of use cases evident in those queries that might point out where people naturally turn to natural language?&#8221;.<br />
These kinds of questions are perfectly reasonable, and they point out to the challenges of promoting a new way of doing things: you want data to support your new way, but most of the data is about the old way. If you buy into the &#8220;need data to justify innovation&#8221; trap, then you will find many reasons not to innovate. I recently came up with an concrete analogy to make this clear:</p>
<blockquote><p>A group of business executives take a 1 week crash course in Chinese in preparation for a business trip to China.  The school records everything that the executives say in China so they can learn how to improve the language education.  On reviewing the transcripts, it is easy to conclude that business executives really want to use Chinese to greet people, to inquire about prices, to order food, and to find the toilet.    </p></blockquote>
<p>So is that what people really want to use Chinese for? Clearly not. This just reflects the small amount of Chinese the executives have learned. If they could say more, they would use it. The analogy to looking at today&#8217;s search engine logs for evidence about when people would want to use natural language is clear.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.barneypell.com/2007/04/when-is-natural-language-useful/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Barney interviewed by Bambi Francisco on Marketwatch about PARC/Powerset deal</title>
		<link>http://www.barneypell.com/2007/02/barney-interviewed-by-bambi-francisco-on-marketwatch-about-parcpowerset-deal/</link>
		<comments>http://www.barneypell.com/2007/02/barney-interviewed-by-bambi-francisco-on-marketwatch-about-parcpowerset-deal/#comments</comments>
		<pubDate>Tue, 13 Feb 2007 21:07:56 +0000</pubDate>
		<dc:creator>Barney</dc:creator>
				<category><![CDATA[Human Language Technology]]></category>
		<category><![CDATA[Powerset]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=86</guid>
		<description><![CDATA[Bambi Francisco, from Marketwatch, interviewed me last Friday about the PARC/Powerset strategic deal announcement. The video is now available. We discussed the following questions: How fundamental is the PARC technology to Powerset? What are the financial terms of the deal? Is PARC now an investor in Powerset? How does the deal being completed affect Powerset&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.marketwatch.com/tvradio/player.asp?bcpid=275898297&#038;bclid=86272812&#038;bctid=496518756">Bambi Francisco, from Marketwatch, interviewed me last Friday</a> about the <a href="http://www.powerset.com/press/07/02/09/parc.html">PARC/Powerset strategic deal announcement</a>. The video is now available.<br />
We discussed the following questions:</p>
<ul>
<li>How fundamental is the PARC technology to Powerset?
<li>What are the financial terms of the deal? Is PARC now an investor in Powerset?
<li>How does the deal being completed affect Powerset&#8217;s product development plans?
<li>Why is this technology so special?
<li>NLP technology has been under development for so long, why is the timing different now?
<li>When is Powerset going to launch?
</ul>
<p>The interview was brief and to the point, and I think we shared some useful perspective on the significance of this deal.<br />
I still need to post an entry to an earlier video interview with Bambi last November. It&#8217;s hard keeping a blog up to date when so much is happening with your startup company.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.barneypell.com/2007/02/barney-interviewed-by-bambi-francisco-on-marketwatch-about-parcpowerset-deal/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google and Powerset in the Sunday Times</title>
		<link>http://www.barneypell.com/2006/11/google-and-powerset-in-the-sunday-times/</link>
		<comments>http://www.barneypell.com/2006/11/google-and-powerset-in-the-sunday-times/#comments</comments>
		<pubDate>Sun, 19 Nov 2006 22:39:08 +0000</pubDate>
		<dc:creator>Barney</dc:creator>
				<category><![CDATA[Human Language Technology]]></category>
		<category><![CDATA[Powerset]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://174.120.172.92/~barneype/?p=81</guid>
		<description><![CDATA[Powerset was mentioned today in The Sunday Times in an article entitled: Quest for last word in search, by Paul Durman. The article interviews my friend Peter Norvig, now director of research at Google, and discusses natural language search. Here are some relevant excerpts: &#8230;Google is trying to find better ways to give its users [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.powerset.com">Powerset</a> was mentioned today in <a<br />
href="http://ww.wtimesonline.co.uk">The Sunday Times</a> in an article<br />
entitled: <a<br />
href="http://www.timesonline.co.uk/article/0,,2095-2459650.html">Quest for<br />
last word in search</a>, by Paul Durman.</p>
<p>The article interviews my friend Peter Norvig, now director of research at Google, and discusses natural language search. Here are some relevant excerpts:</p>
<p><span id="more-81"></span></p>
<blockquote><p>&#8230;Google is trying to find better ways to give its users the information and answers they want. â€œWe are at the very beginning of search,â€ said Norvig. He said his colleagues were â€œdisappointedâ€ that most searches still start by typing a couple of words into a box on a web page.</p></blockquote>
<blockquote><p>Google is also working on speech-recognition technology so that, within a few years, you will simply be able to â€œtellâ€ your mobile phone what you are looking for, and Google will go off and find it.
</p></blockquote>
<blockquote><p>But the holy grail is natural language, or semantic, search â€” enabling users to pose queries, not with a couple of words, but with a properly phrased question.<br />
Todayâ€™s search engines can help with something simple, such as â€œwhat is the capital of Japan?â€ But they struggle with more complex questions, such as â€œwhat companies has IBM bought in the past five years?â€</p></blockquote>
<blockquote><p>The enormous profitability of search advertising is attracting new challengers to this field. Powerset, which hopes to develop a next-generation search engine based on natural language processing, recently raised $12.5m (Â£6.6m) with the help of a stellar cast of Silicon Valley entrepreneurs, including the founders of PayPal and Napster, and two early â€œGooglersâ€.
</p></blockquote>
<blockquote><p>Google is relaxed about the threat. Norvig said: â€œAnybody who is concentrating on search is great because it pushes all of us to get better. You have to wonder where theyâ€™re going. Is this a complete solution or is this a component of a solution?â€ Norvigâ€™s colleague Louis Monier â€” the French founder of the Alta Vista search engine, who is now at Google working on a secret project â€” said he gets a dozen invitations a year to join new search companies. </p></blockquote>
<blockquote><p>&#8230;It is often said that the rival that will overthrow Google is only a click away. Monier is sceptical. â€œItâ€™s very difficult to innovate on the scale that we do,â€ he said. â€œYou need a really radical idea, and need to execute it well.â€ </p></blockquote>
<p>I think both Peter and Louis are right: Search is in its early days, and natural language is the future of search.  But it&#8217;s both difficult to enable natural language search at all, and to do it on a scale like Google is a harder matter still. I&#8217;m not sure I would be &#8220;relaxed&#8221; about the disruption that natural language search will involve. But I&#8217;m really happy that this discussion about natural language search is now happening among the major players, and that Powerset is helping to push the envelope.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.barneypell.com/2006/11/google-and-powerset-in-the-sunday-times/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
