July 3, 2008

Microsoft to acquire Powerset

On Monday, Microsoft and Powerset announced that Powerset is being acquired by Microsoft.

In terms of timing, the companies announced that the deal was signed. There is still the customary period before the deal is officially closed (at which point, I expect we're going to have a great party).

I'm including, below, the text of the announcements from the blogs of Powerset andMicrosoft.
I think these sum up pretty well the logic behind the acquisition on both sides.

It took a lot of work by many people to make this happen. Most significant, of course, was the entire team at Powerset, who executed so well to build and launch a wonderful product that showed the world what is now possible.

Immediately following the announcement, we had a day of calls with members of the press, which resulted in a lot of coverage. I'll try to post a collection of links next week.

One press meeting that I really enjoyed was a podcast with me, Ramez Naam (Group Program Manager for Microsoft Live Search), and Mike Arrington for TechCrunch. That link provides an article, transcript, and the full audio of the interview.

There is a lot more to say about Powerset, Microsoft, the acquisition, and what it means for the future of search, linguistic technology, semantic web, etc. I am excited to be staying on with Microsoft in a strategy and evangelist role and I am looking forward to the chance to talk and write a lot more about this, and from a whole new perspective, soon.

Here is the text of Powerset's blog announcement:

We’re excited to announce officially that Microsoft has signed an agreement to acquire Powerset.

Powerset has always been a small company with big dreams, with the ultimate goal of changing the way humans interact with computers through language. We set out to improve search by indexing Web pages based on the meaning expressed in them rather than just the literal words. Powerset licensed breakthrough technology from PARC, hired world-renowned computational linguists and search engineers, and recently released a search and discovery experience for Wikipedia articles. Our technology helps to improve search results and also makes new features possible, such as Factz, which aggregates information from many articles to summarize a topic.

With any startup, the challenge is to take the seeds of an idea and grow it into a viable company. At Powerset, we transformed our idea into a world-class semantic search platform, demonstrating the future of search with our Wikipedia search experience. But building a large-scale semantic search engine is expensive, requiring an engineering effort and computing resources beyond what most start-ups could ever imagine. Because our goals around improving search align so well, Powerset has decided to team up with Microsoft. We believe that this is the fastest way to bring our technology to market at a large scale.

Microsoft shares our goal to improve search through deeper analysis of queries and documents, and understands that our technology and expertise will play a key role in the evolution of search. With an existing search infrastructure, incredible capital resources, unlimited data, a leading search team, and clear mission to revolutionize the search landscape, Microsoft can rapidly accelerate our progress in building semantic search technology and bringing it to full Web scale. When we launched our first product, we heard: this is great, but when and how will we get Powerset to go beyond Wikpiedia? Microsoft accelerates our ability to move Powerset to the entire Web faster than anyone could have imagined.

Powerset will continue to operate much as we currently do, working in the same building, with the same organizational structure, and with the same uniquely talented and growing team (apply on our jobs page). We’ll continue to tackle the hardest problems in parsing, semantics, ranking, indexing, scalable computing, user experience and all of our other specialties. But now we’ll do it with the support of Microsoft and the vast resources of the entire Live Search team.

Over the past couple of years Powerset has made amazing progress. Starting with just a big idea, we licensed the best linguistic technology, recruited a top-notch team, built out our datacenter, engineered a world-class semantic search platform, tackled deep natural language issues, improved relevance, innovated an interface and launched a great product. So few start-ups ever tackle such deep, scientific problems successfully and create the kind of value we’ve delivered in such short order.

For now, Powerset.com will continue to host our Wikipedia Search & Discovery and we’ll be continuing to experiment with our product, based on user feedback. But, expect many announcements from us in the coming months about how we’re integrating our technology and features into Live Search.

And here's the text of Microsoft's blog announcement:

Powerset joins Live Search

We're excited to announce that we've reached an agreement to acquire Powerset, a San Francisco-based search and natural language company.

Powerset will join our core Search Relevance team, remaining intact in San Francisco. Powerset brings with it natural language technology that nicely complements other natural language processing technologies we have in Microsoft Research.

More importantly, Powerset brings to Live Search a set of talented engineers and computational linguists in downtown San Francisco. This is a great team with a wide range of experience from other search engines and research organizations like PARC (formerly Xerox PARC).

We're buying Powerset first and foremost because we're impressed with the people there. Powerset CTO and cofounder Barney Pell is a visionary and incredible evangelist. When he introduced our senior engineers to some of the most senior people at Powerset — Search engineers and computational linguists like Tim Converse, Chad Walters, Scott Prevost, Lorenzo Thione, and Ron Kaplan — we came away impressed by their smarts, their experience, their passion for search, and a shared vision.

That shared vision is to take Search to the next level by adding understanding of the intent and meaning behind the words in searches and webpages.

We know today that roughly a third of searches don't get answered on the first search and first click. Usually searchers find the information they want eventually, but that often requires multiple searches or clicks on multiple search results. Two specific problems are the most common reasons for this:

* Differences in phrasing or context between a user's search and the way the same information is expressed on webpages. Search engines don't understand today that "shrub" and "tree" are similar concepts. We don't understand that "cancer" sometimes refers to a disease and sometimes refers to a horoscope and when a query or a webpage refers to which.
* Lack of clarity in the descriptions for each webpage in the search results. Sometimes a result looks relevant from its short description on the results page but turns out to be not so relevant when you visit the actual page. As a result, searchers frequently click results and then rapidly click back when they realize they aren't what they're looking for.

These problems exist because search engines today primarily match words in a search to words on a webpage. We can solve these problems by working to understand the intent behind each search and the concepts and meaning embedded in a webpage. Doing so, we can innovate in the quality of the search results, in the flexibility with which searchers can phrase their queries, and in the search user experience. We will use knowledge extracted from webpages to improve the result descriptions and provide new tools to help customers search better.

Working with our existing Search team and other Microsoft teams that focus on natural language, Powerset will help us address all of those problems and opportunities.

We're looking to add even more talented engineers to the San Francisco team to accelerate our shared progress. If you're interested in joining the team, drop us a line.

We'll have more to say about the things we're doing in understanding searches and webpages through natural language technology in the coming months. In the meantime, please join me in welcoming Powerset to Microsoft!

Satya Nadella, Senior Vice President, Search, Portal, and Advertising

Posted by barney at 3:50 PM | Comments (0) | TrackBack

February 26, 2008

In 5 years we will search more with voice than typing

David Vogelpohl wrote an article, Will Microsoft Resurrect Natural Language Search, citing a recent AP article about Bill Gates and voice-based search. Here are some quotes from the AP article:

People will increasingly interact with computers using speech or touch screens rather than keyboards, Microsoft Corp. Chairman Bill Gates said.

“It’s one of the big bets we’re making,” he said during the final stop of a farewell tour before he withdraws from the company’s daily operations in July.

In five years, Microsoft expects more Internet searches to be done through speech than through typing on a keyboard, Gates told about 1,200 students and faculty members Thursday at Carnegie Mellon University.

David conjectures, as do I, that when people speak their searches they are more likely to use natural language than to use keywordese, and that this could change the game in search.

I personally can envision Microsoft trying to integrate speech based data entry as closely as possible with our normal style of speaking. Perhaps the phrase “Where can I buy a hd tv?” would be more natural for searchers when you take away the limitations of the keyboard.

Wide spread speech based data entry will almost certainly impact the way Microsoft and subsequently all other search engines deal with search queries.

It's interesting to see Bill Gates predicting this to happen within 5 years. In the blink of an eye, an entire industry is going to change dramatically.

While on the topic of predictions about voice and language, here's one of my predictions that I have been meaning to write up:

Within 8 years from now (2016), every category of consumer electronics will have some linguistic interface as a standard feature.

By "linguistic interface", I mean voice interactions or text-based interaction that is linguage-based. Not that these devices won't still have nonlinguistic interfaces too (e.g. there will still be buttons, most likely). And by "every category", I mean you will not find a category of consumer electronics that does not have some product in that category with that feature.

For example, users will expect to be able to talk to cameras, tvs, stereos, ipods, phones, watches, microwave ovens, refrigerators, cars, etc. There will still be some cameras that aren't language-enabled, but every category will have some products that are.

As my friends Cliff Nass and Scott Brave write in their book, Voice Activated, when people interact with devices using voice, it also invokes the rest of their social apparatus. You can't hear a voice without ascribing some kind of personality, gender, race, social status, etc to the source of the voice. So in addition to expecting linguistic capability, we're also going to start expecting personality within the next decade.

I'll stop here before I get carried away to the singularity...

Posted by barney at 8:30 PM | Comments (0) | TrackBack

February 25, 2008

Powerset in Forbes article on the Language of Search

Forbes.com has a special issue on language, including interesting articles and interviews by some of my favorite writers on Language.

I'm happy that natural language and semantic search was included in the special issue. Andy Greenberg from Forbes.com published his piece on language and search engines devoting a good portion of the article to Powerset and Hakia, featuring interviews with me and with Hakia's founder Riza Berkan. The article, entitled "Language Web-lish" starts off with Andy using Powerset's metaphor comparing people's current use of search engines to communicating like cavemen:

A question in English, like "What year was Hillary Clinton born?" becomes what he calls a primitive "keywordese": "Hillary Clinton born year."

"We have this great gift of human intelligence based around language," says Pell, "and now we have to translate it into a grunting pidgin language to interact with machines."

Andy described an example I showed him from Powerset:

When a user enters the question, "In what year was Hillary Clinton born?," Powerset's algorithm doesn't simply scour the Web for this collection of words in close proximity. Instead, it looks at pages with an eye for their meaning. Reading the sentence "Born to Dorothy and Hugh Rodham in 1947, Hillary Clinton is a New York senator," Powerset will disassemble the sentence's grammar and extract the fact of Hillary Clinton's birth date. That fact is then connected with the user's question, even if the word order of the result and the query didn't originally match.

Andy also went through an example from Hakia:

Taking the question "What drug is best for treating a urinary tract infection?" Riza Berkan points to the word "drug." Hakia's algorithm, he says, understands that the word contains a massive subset of concepts including synonyms and specific names of medicines. When it spots a term that falls into that subset, like "Amoxicillin," Hakia can substitute the medicine's name for the word "drug" in the result.

"You don't want the word 'drug,' you want the name of the drug," says Berkan. "That's a hidden failure in search engines, and people don't even know what they're missing."

Other natural language and semantic search companies mentioned included Cognition Search and Lexxe.

As is typical, my friend Peter Norvig at Google gets the last word in the article:

Google's Peter Norvig, the search giant's director of research, knows just how complex semantic algorithms can be: His Berkeley Ph.D. thesis tried to develop one in 1978. Every sentence of text, he says, took weeks to analyze. "The result was kind of like a dancing bear," he says. "It was amazing that it could dance at all, but we didn't expect it to star in the Moscow Ballet."

But that doesn't mean Google's engineers are idly watching semantic search from a distance, says Norvig. The company's thousands of engineers are looking at how to incorporate semantic analysis into a search algorithm. But semantic analysis is just one of many directions that Google's teams are exploring... "Basically, we just do whatever works," says Norvig. "Instead of trying to understand everything, we're trying to understand something about billions of pages a week."

But does that pragmatic approach leave Google vulnerable to an innovative start-up willing to risk its fate on building meaning-based search from scratch?

"It's unlikely," says Norvig. "But even car companies have to worry about anti-gravity machines."

I think that analogy is quite a stretch. It's more like big car companies having to worry about smaller companies focused on electric cars. They don't have to worry about this immediately but, at some point, this is going to be the future of their industry.


Posted by barney at 12:16 AM | Comments (0) | TrackBack

November 19, 2007

Natural Language and the Semantic Web: ISWC Keynote talk

I gave an invited keynote talk last week at The 6th International Semantic Web Conference and the 2nd Asian Semantic Web Conference, 2007. The abstract for the talk is below. The image below links to the original video and presentation slides.

The live presentation (and video) contains technical demos that aren't in the slides. Some of the demos are already available inside Powerlabs (e.g. Powermouse, which lets you browse and query our semantic database of facts extracted from Wikipedia), while some of these are still internal (e.g. an open search box, and output of our natural language system on full sentences). I also gave some detailed walk-through showing how Powerset takes advantage of external semantic resources like Wordnet and Freebase.

For me, the most fun part of the talk was toward the end, where I got to speculate on how ecosystem effects can make natural language search and the semantic web become deeper and more powerful more quickly than people might expect. For example, advertisers, publishers, and vertical search sites will be able to contribute ontologies that enable them to get more users, better internal search, and more revenue, while having as a side effect that the broad search engines get more knowledgeable about different domains. The questions afterward were also challenging and interesting.





POWERSET - Natural Language and the Semantic Web

Continue reading "Natural Language and the Semantic Web: ISWC Keynote talk"

Posted by barney at 8:29 PM | Comments (0) | TrackBack

September 12, 2007

Tim Converse on Proximity is a Hack

Powerset's Tim Converse wrote a great article entitled: Proximity is a Hack.

In the article, Tim says that the two biggest improvements in web search were the use of links (including anchor text) and term proximity. The article explores the benefits of term proximity and argues that works to the extent that it approximates linguistic relationships in the text.
He concludes that natural language processing of the documents should have the ability to more accurately capture linguistic relationships even if the query itself is in keywordese (as opposed to a natural language query with internal linguistic structure).

To recap: proximity is both a wonderfully powerful relevance feature, and a total hack. It helps enormously, but it’s not what you really want, it’s just sorta somewhat correlated with what you really want. What you need for what you really want is the underlying structure of all that web content: the real syntactic structure of the sentences, how the sentences connect to each other, how the facts relate, and (maybe) how the discourse flows and the topics connect. We’ve squeezed all the juice we can out of webpages considered as word-vectors; now it’s time to parse this stuff and get at the real structure.

Can that be done? A couple of years ago I would have said no, but I hadn’t seen the PARC natural language technology then, and didn’t know that an effort this concerted and well-funded was on the way. Now, do I think that Powerset will do it? I still don’t know, frankly - there’s so much more to do to make it real and debugged and scaled the way it needs to be. But it’s clear to me that the next big thing in web search is either this or something a whole lot like this, and I think we have the best shot of anyone. And that’s why I’m at Powerset.

The article is definitely good reading for people interested in search and the potential benefits of NLP.

Posted by barney at 9:03 PM | Comments (0) | TrackBack

August 9, 2007

Technology Review on Building a Better Search Engine

Technology Review recently had an article featuring Powerset: Building a Better Search Engine, by Michael Reisman. In addition to Powerset, the article also mentions Hakia and Cognition Search, and closes with a discussion of a semantic search project inside IBM.

I am an avid reader of Technology Review and am really excited to have an article about us in this great publication.

The full article is worth reading.

Here are just a few excerpts about natural language search and Powerset's technology:

The company claims that the engine finds the best answer by considering the meaning and context of the question and related Web pages. "Powerset extracts deep concepts and relationships from the texts, and the users query and match them efficiently to deliver a better search," Powerset CEO Barney Pell says.

Powerset chief technology officer Ron Kaplan has led PARC's XLE team since the 1970s and is the author of much of the technology behind XLE that has been licensed to the company. Kaplan says that he and Pell began to collaborate on the idea about two years ago. Current methods of searching used by more traditional engines focus on isolated keywords and broad but shallow content coverage. This leaves a lot of room for improvement, Kaplan says.

"They are really not getting at relationships," he notes. "The best that they do to approximate relationships are words that are close to other words." He adds that a much deeper level of analysis is required.

The article came in time to announce the upcoming launch of Powerlabs, our early user community:

The company plans to release demo versions of the search engine on its Powerlabs website, where consumers can test-drive the product beginning in September. User feedback will be taken into consideration as Powerset makes the final product, which is slated for release next year.

"The key challenge is to get the system to the point where people can understand how to use it and get real value out of these systems even though they are not perfect," Pell says. "We are finally at the point where we are going to cross that threshold."

Posted by barney at 8:21 PM | Comments (0) | TrackBack

April 7, 2007

When is Natural Language useful?

In talking about Powerset and natural language search, I am frequently asked "When is Natural Language search useful?". The idea here is that maybe there are some specific situations where you really want natural language search. My general response is that this is like asking "When is Natural Language useful?" to talk to other people? The very question assumes that there are some particular situations where you want to use natural language, and others where you would prefer to just grunt out a few words.

I am aware of a number of situations where it is really clear that you want the power and usability of natural language for search, including:

Continue reading "When is Natural Language useful?"

Posted by barney at 8:07 PM | TrackBack

February 13, 2007

Barney interviewed by Bambi Francisco on Marketwatch about PARC/Powerset deal

Bambi Francisco, from Marketwatch, interviewed me last Friday about the PARC/Powerset strategic deal announcement. The video is now available.

We discussed the following questions:

The interview was brief and to the point, and I think we shared some useful perspective on the significance of this deal.

I still need to post an entry to an earlier video interview with Bambi last November. It's hard keeping a blog up to date when so much is happening with your startup company.

Posted by barney at 9:07 PM | TrackBack

November 19, 2006

Google and Powerset in the Sunday Times

Powerset was mentioned today in The Sunday Times in an article entitled: Quest for last word in search, by Paul Durman.

The article interviews my friend Peter Norvig, now director of research at Google, and discusses natural language search. Here are some relevant excerpts:

Continue reading "Google and Powerset in the Sunday Times"

Posted by barney at 10:39 PM | Comments (0) | TrackBack

October 11, 2006

We are all natural language searchers

My Powerset CoFounder, Lorenzo Thione has written a nice article on his blog, in which he argues that we are all natural language searchers. He surveyed the underlying themes in much of the criticism in the current blogstorm about Powerset and natural language search. Lorenzo groups the arguments in support of keyword search into three clusters:

Lorenzo's article addresses each of these points in turn, and it is good reading so I won't summarize all the key points here. I particularly like his response to the "most queries today are short" critique. He introduces the idea of the long tail of failed queries, in which users initially try more natural queries stating what they want, but eventually learn that it doesn't help with the search, so they shorten the queries, which leads to the observation that most queries today are short. It's a bit like looking at the fact that all Model-T cars were black, after Henry Ford decided that's all he would give them, and concluding that there was no market for colorful cars. As Lorenzo says:

The data so far about short queries and past failures of natural language attempts is no indication about what users will really do or not do, as users have never yet been presented with the possibilities of true natural language search.
Combining this with my previous post on my vision of natural language search, this gives a good view of our perspectives on what we think is obvious: that users will ultimately want to interact with search engines in natural language, not just keywordese.

Posted by barney at 2:42 PM | Comments (0) | TrackBack

The Powerset Blogstorm: 1 week later

I wrote a week ago about how Powerset had become the subject of a blog storm, and shared my vision of natural language search. Little did I realize that the storm had barely started. One week later, there are now about 400 blog articles about Powerset, according to Technorati (over 100 with some authority). We got covered by many of the leading writers on search and internet technology. Below are a few comments on some of the articles by high-authority bloggers.

Continue reading "The Powerset Blogstorm: 1 week later"

Posted by barney at 1:09 AM | Comments (2) | TrackBack

October 4, 2006

Powerset and Natural Language Search

Ever since I stated that Powerset was in "semi-stealth" mode about a year ago, I have been pretty quiet about the company on my blog. A few months ago we realized, after going through a fundraising process with a great set of angel investors, that much of Silicon Valley already knew that Powerset was building a natural language search engine. So we finally put some content up on the Powerset website and agreed to let some of our friends write about us. Some of the first articles about Powerset are those by:

But I have been so busy with the company that I just didn't take the time to write up the vision on my own blog.

Powerset has now unexpectedly become the subject of a recent blogstorm, initiated by an article posted yesterday by Matt Marshall on VentureBeat. Since Matt wrote his initial version of the article before he was able to contact us, he expressed skepticism about what he inferred we are trying to do. (Update: Matt Marshall has just written a new article about Powerset, after meeting with me and Steve yesterday.) This article started a debate in the blogosphere, with people coming down on both sides of the "search is great, nobody can compete with Google" vs. "search is broken, go for it" divide (for the former, see Steve Bryant's article, and for the latter, see this article by Richard Koman).

Given all the attention, I want to take time out to share my vision of natural language as the future of search. To start with, I will characterize the conventional thinking as expressed by various critics.

Continue reading "Powerset and Natural Language Search"

Posted by barney at 10:26 PM | Comments (10) | TrackBack

June 23, 2006

Teaching robot dogs linguistic tricks

As quoted in an article on Slashdot:

According to this article at The Engineer Online, researchers led by the Institute of Cognitive Science and Technology in Italy are developing robots that evolve their own language, bypassing the limits of imposing human rule-based communication. The technology, dubbed Embedded and Communicating Agents, has allowed researchers at Sony's Computer Science Laboratory in France to add a new level of intelligence to the AIBO dog. The robot dog has learnt to see a ball and tell another one where the ball is, if it's moving and what colour it is, and the other is capable of recognising it.

I attended a tutorial a few years back where Luc Steels, at Sony's Paris Computer Science Laboratory, discussed earlier versions of this work. The techniques actually seemed reasonable and well thought out, and the way that words compete for niches and evolve in a population of language users was consistent with my intuitive model of how this happens in human populations. It appears that the work has now advanced to where the language includes relational constructs (e.g., an object is moving in a direction).

It's too bad that Sony recently declared the AIBO to be an end-of-life product. I was looking forward to buying one at some point. (For folks interested in AIBO and business models for entertainment robotics, here is a pointer to a Harvard Business Review case about AIBO).

One of the respondents on Slashdot was particularly excited about these developments. His article on Slashdot is worth reading.

Posted by barney at 2:49 PM | Comments (0) | TrackBack

April 8, 2006

In-Car information from the Web: Siemens and SmartWeb project

This press release describes the SmartWeb system. The new system is being developed by experts from the Siemens Corporate Technology Division in Munich and engineers from the Fraunhofer Institute for Device Architecture and Software Technology (FIRST) in Berlin. The vehicle prototype from the SmartWeb project will be on display for the first time at the CeBIT computer trade show March 9-15 in Hanover.

Examples of the use of such a system include finding the lowest gas prices on your route, finding sports scores, or any other internet information that normally resides in tables on web pages.

I found two aspects of the system description interesting:

Siemens is targetting such a system to reach market in 10 years. Ajay Juneja, founder of Speak With Me, (see my previous post on Speak With Me) who pointed me to this announcement, says he can likely have such technology in cars within the next 2 years. I expect the data transmission parts could take longer than the voice technology.

Posted by barney at 3:32 PM | Comments (0) | TrackBack

March 12, 2006

In-car conversational voice interfaces: Speak With Me and VoiceBox

I've been meaning to write about this for a while. Here are two comments about innovations in speech processing for embedded (eg. in-car) applications.

Posted by barney at 1:28 PM | Comments (0) | TrackBack

March 10, 2006

Calendars and Natural Language

My friend Matt Hurst on his DataMining blog wrote about Spongecell, a new Ajax-based calendar that lets you enter events using natural language. Here are some examples Matt tried, with his comments:

I tried out the site. My initial impression of the look and feel was good, including the appearance of the calendar itself and the bubble tips for new users. I found about 75% of my NL inputs were handled correctly on the system, but as Matt says it is likely that users will learn which cases work well and then get high performance using those patterns.

It's perhaps the best kept secret at Microsoft, but did you know that Microsoft Outlook already supports some natural language entry of calendar events as well? Open up an appointment (new or existing) and in the day slot for the time, enter "fourth monday of April". Outlook converts that into the correct date. I use this feature all the time.

TechCrunch has a writeup about SpongeCell and many other players in the Calendar2.0 space. That page features 73 comments, many by other calendar companies, so to some extent this captures the current state of play on this topic. One missing related company from the list is TimeBridge (product coming soon), a Mayfield portfolio company for which I'm advisor, along with my friend Mark Drummond (founder of one of the Calendar1.0 companies, called TimeDance, for those who remember).

After writing the first draft of this article, I saw another post on TechCrunch about Google's upcoming calendar. The information was leaked by a beta tester, and includes detailed screenshots. Key elements from that posting:

Update: I got pointed to another cool calendar2.0: 30 Boxes. It also supports natural language entry, and it looks like it makes it really easy to share events with friends and family.

Posted by barney at 3:36 PM | Comments (1) | TrackBack

September 28, 2005

Barney attending Human Language Technology conference (HLT05) in Vancouver next week

I'll be attending the Human Language Technology conference next week in Vancouver, B.C., Canada.
I agree with Ron Kaplan's view at last week's Accelerating Change Conference that Conversational User Interfaces will come of age in over the next 5-10 years. I'm looking forward to seeing old friends, learning what's hot, and getting a sneak peak at exciting demos at the conference.

If you're going to be at the conference, let me know so we can meet up. Also, I'm interested in human language technologies that can transform search experience, so please comment on this page if you see something we should all know about (or email if you want to keep it between us!).

Posted by barney at 7:31 PM | Comments (0) | TrackBack