New Comment
13 comments, sorted by Click to highlight new comments since: Today at 8:48 PM

This technology could surely be used for automated fact-checking. I've dreamed about the possibilities for automated fact-checking to improve online debate, and considered attempting to write it myself (using data sources like Wolfram Alpha and Google Public Data, and parsing natural language with a big dirty pile of regular expressions, which can't be the best approach but it's an approach I could attempt in a weekend).

For example, a program could examine a text for sentences matching "[X country] is [comparison] than [Y country]" and check that against online data sources. So if you're typing a comment into reddit or wherever, it would first check your comment for claims it can parse, and alert you if you are mistaken about something so you can change it. And if you are correct about something, but didn't supply a citation, it could add it in automatically.

This could have several awesome effects:

  • It would make online debates more accurate, spreading truth and knowledge.
  • It might set a standard of factual accuracy; people might come to expect assertions to be backed up with citations, and regard claims as suspicious if no supporting reference is provided.
  • Just knowing their facts will be checked (possibly by a future version of the software being re-run on old forum comments and highlighting inaccuracies for all to see) might make people more careful in their assertions.

If voice recognition technology improves, then this might eventually fact-check TV in real-time.

I wish I believed that it wouldn't just lead to people seeding the online data sources with contradictory claims, causing such a fact-checker to return the equivalent of modern journalistic "we report; you decide" fact-agnostic reporting.

Still, the technology would be useful regardless.

A friend of mine, back in his MIT Media Lab days, was working on a "remembrance agent" that more or less worked along these lines but processed local hard drive content, to remind the user of previous conversations/projects/etc. related to what they're currently working on.

What happened to the "remembrance agent" program? A quick search for the term finds the remem extension for emacs, which sounds similar. Might be good for the list of power tools on the wiki.

Apparently the project died for lack of corporate sponsorship; nobody could figure out how to make money with it. Eventually a similar project with a different orientation became a popular product for cross-linking related websites for advertising purposes.

Beats me; I assume he went on to work on other things. I just dropped him some email to ask him about it; I'll let you know if I learn anything interesting.

I want the timeframe until similar capability is available from my web browser. I'd guess at perhaps 20% in 5 years, 90% in 10. I'm sure significant engineering would have to be performed to maintain performance while downsizing the hardware requirements.

I'd say that the challenge depends on the business model. Making it efficient enough to be profitable as a free web service will be quite tough, so your estimates are probably appropriate. On the other hand, if they can charge $200/month for it, then those estimates are very conservative...

This is the next step for Google to take. If another company manages get this service out for free first, they will be the new search engine everyone goes to. This alone could be highly profitable if they aren't as reluctant to use ads as Google is.

There's been a big back-and-forth about how big an impact QA can make on information-retrieval. A few points:

  1. Writing a proper natural language question isn't necessarily easier than writing a Google query. You don't always know what you need to know, and providing feedback from the QA system is very difficult. You can iterate towards the right query fairly well, as your searching teaches you better keywords.

  2. The percentage of queries that can be answered by a single sentence is substantial, but might be smaller than you think. People also use search engines to navigate around, to find long articles of interest, and to carry out tasks (e.g. shopping).

  3. The strictly informational queries probably aren't that important to Google's revenue. The best queries to be serving are the ones where the user wants to buy something, because that's where people will pay for advertising. If a competitor takes away the informational searches but can't serve the commerce searches too, I doubt Google will be sweating much.

True. When I said that, I was thinking of a service that does what Watson does and gives Google-style answers.

So, if the query "What is the capital of the United States" was made, at the top it would say Washington D.C. and after that it would show search results, similar to how Google shows answers to unit conversion searches.

So, if the query "What is the capital of the United States" was made, at the top it would say Washington D.C. and after that it would show search results,

You mean like... erm... Google does at the moment if you search for "What is the capital of the United States?"?

Damn, I was afraid it would show that. So a more difficult query.

Although, that is a sign that Google is already experimenting with that idea, only with a far simpler algorithm.

There is a site, True Knowledge, that attempts to answer such questions using methods similar to Watson's.

It relies on NLP and "facts"; for example, the query "What is the capital of the United States" relies on this fact:

This fact asserts that the relationship '"is the capital of"' exists between "Washington, D.C." and "the United States" at some point in time. Other facts in the knowledge base assert that this fact applies for the following time periods:

Each "fact" is assessed by a variety of sources, both human and automated. For example, the fact above has, as some of its assessments:

Extracted by [true knowledge] from the infobox on Wikipedia page http://en.wikipedia.org/wiki/United_States, downloaded 19 May 2009.

and

Fact extracted from Factacular using natural language processing of the following snippit "The United States of America has the capital Washington, D.C.."