March 23, 2009
Wolfram Alpha: A New Kind of Question-Answering System
There has been much excitement recently over the upcoming launch of Wolfram Alpha. This is a new question-answering system developed by Stephen Wolfram, inventor of Mathematica, and it is scheduled for a beta launch in May. Wolfram has been providing demos to industry insiders. I haven’t had a demo yet, but I have learned what I could from reading articles by Nova Spivak (“Wolfram Alpha computes answers to factual questions. This is going to be big”) and Doug Lenat (“I was positively impressed with Wolfram Alpha”). And this weekend I spoke with William Tunstall-Pedoe, CEO of True Knowledge, who also got a demo. Many of my examples and conclusions come from conversation with William (thanks!). Since life is short and so is the attention of web readers, I’ll give the rest of my thoughts in bullet form.
What it is: A new kind of question-answering system.
Examples
- Math: “2+2″ and then a few simple math questions: “integrate xsin^4xdx”, “what is the square root of 18″ etc.
- Business: “gdp france” showed amount and graph of how it changed over time. “gdp france/germany” showed graph with both amounts and the ratio
- “internet users in Europe”: Showed total, and a chart of usage by country in Europe, at the current time, specifically highlighting the biggest and smallest
- “ISS”: generates a graphic rendition of the international space station orbiting earth and updating in real-time
- “tides in san Francisco”: showed a graph of tides over time, where the times were listed in the local time regime current in the late 19th century for those data points. “tide NYC 11/12/1922” gave a single answer.
- “weather”: showed graph of average temperature in Cambridge, MA (where Stephen was when doing the demo). Based on reverse IP lookup.
- Computational fluid dynamics: typing in the name of a specific aerofoil produced a picture of that aerofoil along with its differential equations.
- stock prices: “MSFT CSCO” showed comparison chart
- chemicals: Substances at temperature or pressure, got physical properties calculated. “H2SO4” showed a diagram and chemical properties. “5 molar h2s04″ did something cool, I don’t know what.
- genome sequences: “AGTAG” shows sequences from the human genome that match that pattern
- data about people: “How old is Barack Obama” gives his age now. “When was Alan Turing born” gives the answer. “How old is Alan Turing” (a trick question) gives an error message with no human-readable explanation (True Knowledge, by contrast, tells you exactly why this is a trick question).
Coverage of data: It answers questions over the following types of structured data:
- static tables and databases (e.g. a database of internet usage by country by year)
- dynamic data feeds (e.g. historical stock market data, position of space shuttle, weather)
- numerical inference (e.g. math questions)
- numerical computations and simulations (e.g. tides, astronomy, chemistry)
continue reading the Wolfram Alpha: A New Kind of Question-Answering System
Posted by barney on March 23, 2009 at 10:03 pm | No Comments
July 3, 2008
Microsoft to acquire Powerset
On Monday, Microsoft and Powerset announced that Powerset is being acquired by Microsoft.
In terms of timing, the companies announced that the deal was signed. There is still the customary period before the deal is officially closed (at which point, I expect we’re going to have a great party).
I’m including, below, the text of the announcements from the blogs of Powerset andMicrosoft.
I think these sum up pretty well the logic behind the acquisition on both sides.
It took a lot of work by many people to make this happen. Most significant, of course, was the entire team at Powerset, who executed so well to build and launch a wonderful product that showed the world what is now possible.
Immediately following the announcement, we had a day of calls with members of the press, which resulted in a lot of coverage. I’ll try to post a collection of links next week.
One press meeting that I really enjoyed was a podcast with me, Ramez Naam (Group Program Manager for Microsoft Live Search), and Mike Arrington for TechCrunch. That link provides an article, transcript, and the full audio of the interview.
There is a lot more to say about Powerset, Microsoft, the acquisition, and what it means for the future of search, linguistic technology, semantic web, etc. I am excited to be staying on with Microsoft in a strategy and evangelist role and I am looking forward to the chance to talk and write a lot more about this, and from a whole new perspective, soon.
Here is the text of Powerset’s blog announcement:
We’re excited to announce officially that Microsoft has signed an agreement to acquire Powerset.Powerset has always been a small company with big dreams, with the ultimate goal of changing the way humans interact with computers through language. We set out to improve search by indexing Web pages based on the meaning expressed in them rather than just the literal words. Powerset licensed breakthrough technology from PARC, hired world-renowned computational linguists and search engineers, and recently released a search and discovery experience for Wikipedia articles. Our technology helps to improve search results and also makes new features possible, such as Factz, which aggregates information from many articles to summarize a topic.
With any startup, the challenge is to take the seeds of an idea and grow it into a viable company. At Powerset, we transformed our idea into a world-class semantic search platform, demonstrating the future of search with our Wikipedia search experience. But building a large-scale semantic search engine is expensive, requiring an engineering effort and computing resources beyond what most start-ups could ever imagine. Because our goals around improving search align so well, Powerset has decided to team up with Microsoft. We believe that this is the fastest way to bring our technology to market at a large scale.
Microsoft shares our goal to improve search through deeper analysis of queries and documents, and understands that our technology and expertise will play a key role in the evolution of search. With an existing search infrastructure, incredible capital resources, unlimited data, a leading search team, and clear mission to revolutionize the search landscape, Microsoft can rapidly accelerate our progress in building semantic search technology and bringing it to full Web scale. When we launched our first product, we heard: this is great, but when and how will we get Powerset to go beyond Wikpiedia? Microsoft accelerates our ability to move Powerset to the entire Web faster than anyone could have imagined.
Powerset will continue to operate much as we currently do, working in the same building, with the same organizational structure, and with the same uniquely talented and growing team (apply on our jobs page). We’ll continue to tackle the hardest problems in parsing, semantics, ranking, indexing, scalable computing, user experience and all of our other specialties. But now we’ll do it with the support of Microsoft and the vast resources of the entire Live Search team.
Over the past couple of years Powerset has made amazing progress. Starting with just a big idea, we licensed the best linguistic technology, recruited a top-notch team, built out our datacenter, engineered a world-class semantic search platform, tackled deep natural language issues, improved relevance, innovated an interface and launched a great product. So few start-ups ever tackle such deep, scientific problems successfully and create the kind of value we’ve delivered in such short order.
For now, Powerset.com will continue to host our Wikipedia Search & Discovery and we’ll be continuing to experiment with our product, based on user feedback. But, expect many announcements from us in the coming months about how we’re integrating our technology and features into Live Search.
And here’s the text of Microsoft’s blog announcement:
Powerset joins Live SearchWe’re excited to announce that we’ve reached an agreement to acquire Powerset, a San Francisco-based search and natural language company.
Powerset will join our core Search Relevance team, remaining intact in San Francisco. Powerset brings with it natural language technology that nicely complements other natural language processing technologies we have in Microsoft Research.
More importantly, Powerset brings to Live Search a set of talented engineers and computational linguists in downtown San Francisco. This is a great team with a wide range of experience from other search engines and research organizations like PARC (formerly Xerox PARC).
We’re buying Powerset first and foremost because we’re impressed with the people there. Powerset CTO and cofounder Barney Pell is a visionary and incredible evangelist. When he introduced our senior engineers to some of the most senior people at Powerset — Search engineers and computational linguists like Tim Converse, Chad Walters, Scott Prevost, Lorenzo Thione, and Ron Kaplan — we came away impressed by their smarts, their experience, their passion for search, and a shared vision.
That shared vision is to take Search to the next level by adding understanding of the intent and meaning behind the words in searches and webpages.
We know today that roughly a third of searches don’t get answered on the first search and first click. Usually searchers find the information they want eventually, but that often requires multiple searches or clicks on multiple search results. Two specific problems are the most common reasons for this:
* Differences in phrasing or context between a user’s search and the way the same information is expressed on webpages. Search engines don’t understand today that “shrub” and “tree” are similar concepts. We don’t understand that “cancer” sometimes refers to a disease and sometimes refers to a horoscope and when a query or a webpage refers to which.
* Lack of clarity in the descriptions for each webpage in the search results. Sometimes a result looks relevant from its short description on the results page but turns out to be not so relevant when you visit the actual page. As a result, searchers frequently click results and then rapidly click back when they realize they aren’t what they’re looking for.These problems exist because search engines today primarily match words in a search to words on a webpage. We can solve these problems by working to understand the intent behind each search and the concepts and meaning embedded in a webpage. Doing so, we can innovate in the quality of the search results, in the flexibility with which searchers can phrase their queries, and in the search user experience. We will use knowledge extracted from webpages to improve the result descriptions and provide new tools to help customers search better.
Working with our existing Search team and other Microsoft teams that focus on natural language, Powerset will help us address all of those problems and opportunities.
We’re looking to add even more talented engineers to the San Francisco team to accelerate our shared progress. If you’re interested in joining the team, drop us a line.
We’ll have more to say about the things we’re doing in understanding searches and webpages through natural language technology in the coming months. In the meantime, please join me in welcoming Powerset to Microsoft!
Satya Nadella, Senior Vice President, Search, Portal, and Advertising
Posted by barney on July 3, 2008 at 3:50 pm | No Comments
February 26, 2008
In 5 years we will search more with voice than typing
David Vogelpohl wrote an article, Will Microsoft Resurrect Natural Language Search, citing a recent AP article about Bill Gates and voice-based search. Here are some quotes from the AP article:
People will increasingly interact with computers using speech or touch screens rather than keyboards, Microsoft Corp. Chairman Bill Gates said.“It’s one of the big bets we’re making,” he said during the final stop of a farewell tour before he withdraws from the company’s daily operations in July.
In five years, Microsoft expects more Internet searches to be done through speech than through typing on a keyboard, Gates told about 1,200 students and faculty members Thursday at Carnegie Mellon University.
David conjectures, as do I, that when people speak their searches they are more likely to use natural language than to use keywordese, and that this could change the game in search.
I personally can envision Microsoft trying to integrate speech based data entry as closely as possible with our normal style of speaking. Perhaps the phrase “Where can I buy a hd tv?” would be more natural for searchers when you take away the limitations of the keyboard.Wide spread speech based data entry will almost certainly impact the way Microsoft and subsequently all other search engines deal with search queries.
It’s interesting to see Bill Gates predicting this to happen within 5 years. In the blink of an eye, an entire industry is going to change dramatically.
While on the topic of predictions about voice and language, here’s one of my predictions that I have been meaning to write up:
Within 8 years from now (2016), every category of consumer electronics will have some linguistic interface as a standard feature.
By “linguistic interface”, I mean voice interactions or text-based interaction that is linguage-based. Not that these devices won’t still have nonlinguistic interfaces too (e.g. there will still be buttons, most likely). And by “every category”, I mean you will not find a category of consumer electronics that does not have some product in that category with that feature.
For example, users will expect to be able to talk to cameras, tvs, stereos, ipods, phones, watches, microwave ovens, refrigerators, cars, etc. There will still be some cameras that aren’t language-enabled, but every category will have some products that are.
As my friends Cliff Nass and Scott Brave write in their book, Voice Activated, when people interact with devices using voice, it also invokes the rest of their social apparatus. You can’t hear a voice without ascribing some kind of personality, gender, race, social status, etc to the source of the voice. So in addition to expecting linguistic capability, we’re also going to start expecting personality within the next decade.
I’ll stop here before I get carried away to the singularity…
Posted by barney on February 26, 2008 at 8:30 pm | No Comments