« | Main |

March 23, 2009

Wolfram Alpha: A New Kind of Question-Answering System

There has been much excitement recently over the upcoming launch of Wolfram Alpha. This is a new question-answering system developed by Stephen Wolfram, inventor of Mathematica, and it is scheduled for a beta launch in May. Wolfram has been providing demos to industry insiders. I haven’t had a demo yet, but I have learned what I could from reading articles by Nova Spivak (“Wolfram Alpha computes answers to factual questions. This is going to be big”) and Doug Lenat (“I was positively impressed with Wolfram Alpha”). And this weekend I spoke with William Tunstall-Pedoe, CEO of True Knowledge, who also got a demo. Many of my examples and conclusions come from conversation with William (thanks!). Since life is short and so is the attention of web readers, I’ll give the rest of my thoughts in bullet form.

What it is: A new kind of question-answering system.

Examples

Coverage of data: It answers questions over the following types of structured data:

Form of queries

  • The queries are expressed in template-based natural language or corresponding abbreviated forms
  • NL syntax: “what is the gdp of france”
  • Template compressed: {attribute} of {object} {time} (“gdp france 2008”)
  • Mathematical expressions, or NL versions of these (as one might do in an entry-level LISP class)
  • I can imagine the query language supports (or could support) restrictions on presentation (plot, chart) and other constraints one might express in SQL (order by, etc), though I haven’t seen any examples showing this exists at present. Presentation and Answers
    • Answers can be a single fact, a table, or a graphical display of a live simulation. Usually it’s a combination of these.
    • For ambiguous queries, it always picks one interpretation. And you can switch to something else if that’s wrong. (A drop-down menu of other alternatives).

    Domains and Generality

  • Wolfram Alpha is described as an open domain question answering system on structured data. But how exactly is this open domain? I distinguish three levels of domain generality:
    • Closed domain: A specified domain
    • Multi domain: Multiple domains are covered, we try to add more domains, but still treats each one a closed. Note: this can be accomplished through a unified or disjoint treatment.
    • Open domain: Any domain is within scope
  • For Wolfram Alpha they have taken a domain-by-domain approach. For each domain, they determined what type of questions to support, and which data, feeds, or simulations to incorporate, and did hand curation to enable these.
  • The domains are typically fact and data oriented, especially where simulations are available Architecture
  • The system is coded in Mathematica, about 4.5M lines of code, developed by a large team (100 people at present).
  • From this presentation on Mathematica it is quite easy to extrapolate what Wolfram Alpha is like – essentially Mathematica + a vast library of mathematical models and data attached + some error-tolerant processing of the user’s input (thanks Peter Clark for pointing this out).
  • Piecing together the Mathematica approach and generalizing from the examples and my own knowledge, I believe they have a basic level of representational tools that gets shared for multiple domains. Here’s how I would think about this:
    • Define the objects in the domain
    • Make a table of function names and attributes in the domain, and for each function or attribute list the restrictions on the type of objects that this can apply to.
    • Standardize representations of time and place and charting elements associated with these.
    • Import and normalize data
    • Associate data fields to objects and attributes in the domain

    Infrastructure

  • The system runs on thousands of expensive servers (running mathematica in real-time).
  • Apparently 10 machines per query give 1 queries per second (qps), so they can do 100 qps on 1,000 machines. What is innovative about this
  • Rich mathematical computational infrastructure (Mathematica) to support mathematical aspects of natural language queries
  • Integration of mathematical inference and simulations along with structured data in a single question-answering system
  • Unprecedented level of structured data aggregation and curation
  • Rich presentation including static and dynamic elements and multiple modalities
  • (Potentially) Deployment of NL-to-SQL query translation in a multi-domain system. The technology has existed to do this for several years But I don’t know if anyone has deployed it yet. I’m not sure if Wolfram has deployed this and haven’t seen enough examples to indicate if they have. What it doesn’t do
  • Queries or presentation against unstructured data (neither keyword nor NL queries against unstructured data, which is a strength of Powerset)
  • Queries requiring ontological or commonsense inference (whether structured or unstructured, which is a strength of True Knowledge and Cyc)
  • Answers in support of transactions (e.g. price feeds from many merchants or airlines), which is shown in various stages in many major search engines
  • Cross-domain multiple domains (e.g. “what was the weather in San Francisco when Yahoo was founded”, which is a strength of True Knowledge) Implications for the field
    • Question answering has been an important part of search results the whole time, but it has often been a second class citizen and hardly promoted
    • By increasing the level of comprehensiveness of structured questions (in terms of data and domains), this can increase awareness and usage of question answering systems
    • This should move question answering to be more of a competitive feature across search engines
    • Users will want to ask questions for structured and unstructured queries, not just structured queries, which will increase perceived differentiation for technology like Powerset
    • If the use of structured data and simulations prove valuable to large number of users and search engines, then this will increase the need to transform and route queries to vertical experts, potentially developed by ecosystem partners
    • This will increase the need and value for ecosystem players to add semantic markup to their structured data and simulations, hence making it easier to offer more semantic question answering and integration with other services, and expanding the value of the services by search engines in a virtuous cycle

    Conclusion

    In conclusion, Wolfram Alpha is not going to be a new search engine or a universal answer engine. It is not going to put the existing major players or semantic search startups out of business. But there appears to be real innovation here, leading to at least a new kind of system that we have not seen before. I am eagerly looking forward to my turn to try it out.

  • Posted by barney at March 23, 2009 10:03 pm

    This entry was posted in Collective Intelligence, Human Language Technology, Information retrieval, Powerset, Science, Search, Software, Web/Tech

    Trackbacks & Pingbacks

    Trackback URL for this entry:
    http://www.barneypell.com/xmlrpc.php

    Leave a Reply

    Name:

    Email Address:

    URL:

    Comments: