March 23, 2009
Wolfram Alpha: A New Kind of Question-Answering System
There has been much excitement recently over the upcoming launch of Wolfram Alpha. This is a new question-answering system developed by Stephen Wolfram, inventor of Mathematica, and it is scheduled for a beta launch in May. Wolfram has been providing demos to industry insiders. I haven’t had a demo yet, but I have learned what I could from reading articles by Nova Spivak (“Wolfram Alpha computes answers to factual questions. This is going to be big”) and Doug Lenat (“I was positively impressed with Wolfram Alpha”). And this weekend I spoke with William Tunstall-Pedoe, CEO of True Knowledge, who also got a demo. Many of my examples and conclusions come from conversation with William (thanks!). Since life is short and so is the attention of web readers, I’ll give the rest of my thoughts in bullet form.
What it is: A new kind of question-answering system.
- Math: “2+2″ and then a few simple math questions: “integrate xsin^4xdx”, “what is the square root of 18″ etc.
- Business: “gdp france” showed amount and graph of how it changed over time. “gdp france/germany” showed graph with both amounts and the ratio
- “internet users in Europe”: Showed total, and a chart of usage by country in Europe, at the current time, specifically highlighting the biggest and smallest
- “ISS”: generates a graphic rendition of the international space station orbiting earth and updating in real-time
- “tides in san Francisco”: showed a graph of tides over time, where the times were listed in the local time regime current in the late 19th century for those data points. “tide NYC 11/12/1922” gave a single answer.
- “weather”: showed graph of average temperature in Cambridge, MA (where Stephen was when doing the demo). Based on reverse IP lookup.
- Computational fluid dynamics: typing in the name of a specific aerofoil produced a picture of that aerofoil along with its differential equations.
- stock prices: “MSFT CSCO” showed comparison chart
- chemicals: Substances at temperature or pressure, got physical properties calculated. “H2SO4” showed a diagram and chemical properties. “5 molar h2s04″ did something cool, I don’t know what.
- genome sequences: “AGTAG” shows sequences from the human genome that match that pattern
- data about people: “How old is Barack Obama” gives his age now. “When was Alan Turing born” gives the answer. “How old is Alan Turing” (a trick question) gives an error message with no human-readable explanation (True Knowledge, by contrast, tells you exactly why this is a trick question).
Coverage of data: It answers questions over the following types of structured data:
- static tables and databases (e.g. a database of internet usage by country by year)
- dynamic data feeds (e.g. historical stock market data, position of space shuttle, weather)
- numerical inference (e.g. math questions)
- numerical computations and simulations (e.g. tides, astronomy, chemistry)
July 3, 2008
Microsoft to acquire Powerset
On Monday, Microsoft and Powerset announced that Powerset is being acquired by Microsoft.
In terms of timing, the companies announced that the deal was signed. There is still the customary period before the deal is officially closed (at which point, I expect we’re going to have a great party).
I’m including, below, the text of the announcements from the blogs of Powerset andMicrosoft.
I think these sum up pretty well the logic behind the acquisition on both sides.
It took a lot of work by many people to make this happen. Most significant, of course, was the entire team at Powerset, who executed so well to build and launch a wonderful product that showed the world what is now possible.
Immediately following the announcement, we had a day of calls with members of the press, which resulted in a lot of coverage. I’ll try to post a collection of links next week.
One press meeting that I really enjoyed was a podcast with me, Ramez Naam (Group Program Manager for Microsoft Live Search), and Mike Arrington for TechCrunch. That link provides an article, transcript, and the full audio of the interview.
There is a lot more to say about Powerset, Microsoft, the acquisition, and what it means for the future of search, linguistic technology, semantic web, etc. I am excited to be staying on with Microsoft in a strategy and evangelist role and I am looking forward to the chance to talk and write a lot more about this, and from a whole new perspective, soon.
Here is the text of Powerset’s blog announcement:
We’re excited to announce officially that Microsoft has signed an agreement to acquire Powerset.Powerset has always been a small company with big dreams, with the ultimate goal of changing the way humans interact with computers through language. We set out to improve search by indexing Web pages based on the meaning expressed in them rather than just the literal words. Powerset licensed breakthrough technology from PARC, hired world-renowned computational linguists and search engineers, and recently released a search and discovery experience for Wikipedia articles. Our technology helps to improve search results and also makes new features possible, such as Factz, which aggregates information from many articles to summarize a topic.
With any startup, the challenge is to take the seeds of an idea and grow it into a viable company. At Powerset, we transformed our idea into a world-class semantic search platform, demonstrating the future of search with our Wikipedia search experience. But building a large-scale semantic search engine is expensive, requiring an engineering effort and computing resources beyond what most start-ups could ever imagine. Because our goals around improving search align so well, Powerset has decided to team up with Microsoft. We believe that this is the fastest way to bring our technology to market at a large scale.
Microsoft shares our goal to improve search through deeper analysis of queries and documents, and understands that our technology and expertise will play a key role in the evolution of search. With an existing search infrastructure, incredible capital resources, unlimited data, a leading search team, and clear mission to revolutionize the search landscape, Microsoft can rapidly accelerate our progress in building semantic search technology and bringing it to full Web scale. When we launched our first product, we heard: this is great, but when and how will we get Powerset to go beyond Wikpiedia? Microsoft accelerates our ability to move Powerset to the entire Web faster than anyone could have imagined.
Powerset will continue to operate much as we currently do, working in the same building, with the same organizational structure, and with the same uniquely talented and growing team (apply on our jobs page). We’ll continue to tackle the hardest problems in parsing, semantics, ranking, indexing, scalable computing, user experience and all of our other specialties. But now we’ll do it with the support of Microsoft and the vast resources of the entire Live Search team.
Over the past couple of years Powerset has made amazing progress. Starting with just a big idea, we licensed the best linguistic technology, recruited a top-notch team, built out our datacenter, engineered a world-class semantic search platform, tackled deep natural language issues, improved relevance, innovated an interface and launched a great product. So few start-ups ever tackle such deep, scientific problems successfully and create the kind of value we’ve delivered in such short order.
For now, Powerset.com will continue to host our Wikipedia Search & Discovery and we’ll be continuing to experiment with our product, based on user feedback. But, expect many announcements from us in the coming months about how we’re integrating our technology and features into Live Search.
And here’s the text of Microsoft’s blog announcement:
Powerset joins Live SearchWe’re excited to announce that we’ve reached an agreement to acquire Powerset, a San Francisco-based search and natural language company.
Powerset will join our core Search Relevance team, remaining intact in San Francisco. Powerset brings with it natural language technology that nicely complements other natural language processing technologies we have in Microsoft Research.
More importantly, Powerset brings to Live Search a set of talented engineers and computational linguists in downtown San Francisco. This is a great team with a wide range of experience from other search engines and research organizations like PARC (formerly Xerox PARC).
We’re buying Powerset first and foremost because we’re impressed with the people there. Powerset CTO and cofounder Barney Pell is a visionary and incredible evangelist. When he introduced our senior engineers to some of the most senior people at Powerset — Search engineers and computational linguists like Tim Converse, Chad Walters, Scott Prevost, Lorenzo Thione, and Ron Kaplan — we came away impressed by their smarts, their experience, their passion for search, and a shared vision.
That shared vision is to take Search to the next level by adding understanding of the intent and meaning behind the words in searches and webpages.
We know today that roughly a third of searches don’t get answered on the first search and first click. Usually searchers find the information they want eventually, but that often requires multiple searches or clicks on multiple search results. Two specific problems are the most common reasons for this:
* Differences in phrasing or context between a user’s search and the way the same information is expressed on webpages. Search engines don’t understand today that “shrub” and “tree” are similar concepts. We don’t understand that “cancer” sometimes refers to a disease and sometimes refers to a horoscope and when a query or a webpage refers to which.
* Lack of clarity in the descriptions for each webpage in the search results. Sometimes a result looks relevant from its short description on the results page but turns out to be not so relevant when you visit the actual page. As a result, searchers frequently click results and then rapidly click back when they realize they aren’t what they’re looking for.
These problems exist because search engines today primarily match words in a search to words on a webpage. We can solve these problems by working to understand the intent behind each search and the concepts and meaning embedded in a webpage. Doing so, we can innovate in the quality of the search results, in the flexibility with which searchers can phrase their queries, and in the search user experience. We will use knowledge extracted from webpages to improve the result descriptions and provide new tools to help customers search better.
Working with our existing Search team and other Microsoft teams that focus on natural language, Powerset will help us address all of those problems and opportunities.
We’re looking to add even more talented engineers to the San Francisco team to accelerate our shared progress. If you’re interested in joining the team, drop us a line.
We’ll have more to say about the things we’re doing in understanding searches and webpages through natural language technology in the coming months. In the meantime, please join me in welcoming Powerset to Microsoft!
Satya Nadella, Senior Vice President, Search, Portal, and Advertising
February 25, 2008
Powerset in Forbes article on the Language of Search
Forbes.com has a special issue on language, including interesting articles and interviews by some of my favorite writers on Language.
I’m happy that natural language and semantic search was included in the special issue. Andy Greenberg from Forbes.com published his piece on language and search engines devoting a good portion of the article to Powerset and Hakia, featuring interviews with me and with Hakia’s founder Riza Berkan. The article, entitled “Language Web-lish” starts off with Andy using Powerset’s metaphor comparing people’s current use of search engines to communicating like cavemen:
A question in English, like “What year was Hillary Clinton born?” becomes what he calls a primitive “keywordese”: “Hillary Clinton born year.”"We have this great gift of human intelligence based around language,” says Pell, “and now we have to translate it into a grunting pidgin language to interact with machines.”
Andy described an example I showed him from Powerset:
When a user enters the question, “In what year was Hillary Clinton born?,” Powerset’s algorithm doesn’t simply scour the Web for this collection of words in close proximity. Instead, it looks at pages with an eye for their meaning. Reading the sentence “Born to Dorothy and Hugh Rodham in 1947, Hillary Clinton is a New York senator,” Powerset will disassemble the sentence’s grammar and extract the fact of Hillary Clinton’s birth date. That fact is then connected with the user’s question, even if the word order of the result and the query didn’t originally match.
Andy also went through an example from Hakia:
Taking the question “What drug is best for treating a urinary tract infection?” Riza Berkan points to the word “drug.” Hakia’s algorithm, he says, understands that the word contains a massive subset of concepts including synonyms and specific names of medicines. When it spots a term that falls into that subset, like “Amoxicillin,” Hakia can substitute the medicine’s name for the word “drug” in the result.”You don’t want the word ‘drug,’ you want the name of the drug,” says Berkan. “That’s a hidden failure in search engines, and people don’t even know what they’re missing.”
As is typical, my friend Peter Norvig at Google gets the last word in the article:
Google’s Peter Norvig, the search giant’s director of research, knows just how complex semantic algorithms can be: His Berkeley Ph.D. thesis tried to develop one in 1978. Every sentence of text, he says, took weeks to analyze. “The result was kind of like a dancing bear,” he says. “It was amazing that it could dance at all, but we didn’t expect it to star in the Moscow Ballet.”But that doesn’t mean Google’s engineers are idly watching semantic search from a distance, says Norvig. The company’s thousands of engineers are looking at how to incorporate semantic analysis into a search algorithm. But semantic analysis is just one of many directions that Google’s teams are exploring… “Basically, we just do whatever works,” says Norvig. “Instead of trying to understand everything, we’re trying to understand something about billions of pages a week.”
But does that pragmatic approach leave Google vulnerable to an innovative start-up willing to risk its fate on building meaning-based search from scratch?
“It’s unlikely,” says Norvig. “But even car companies have to worry about anti-gravity machines.”
I think that analogy is quite a stretch. It’s more like big car companies having to worry about smaller companies focused on electric cars. They don’t have to worry about this immediately but, at some point, this is going to be the future of their industry.