syllogism

syllogism's Comments

Neural program synthesis is a dangerous technology

I don't think I meant to imply that -- could you point out where I seem to be making that assumption?

Obviously there are more exploits for a computer running Windows 95 than a carefully firewalled Linux server.

Neural program synthesis is a dangerous technology

Thanks!

Okay, I'll paste the content in. I think you're right -- a link post is pretty much strictly worse.

Neural program synthesis is a dangerous technology

First post on LW2, so apologies if I've not kept norms properly -- let me know if I should edit.

I considered doing this as a cross-post but it felt weird without rewriting, as the knowledge assumptions were all wrong --- so I decided to just link.

Efficient Open Source

I don't think the Hamming advice is so great. It's akin to asking, "What are the highest salary professions? Why aren't you entering them?".

Academia is a market-place. Everyone wants high research impact, for a given expenditure of time. Some opportunities are higher-value than others, but as those opportunities appear, other researchers are going to identify them too.

So in academia, as in the economy, it's better to identify your comparative advantage --- both short-term, and long-term. You usually need to publish something quickly, so you need to know what you can do right away. But you also want to plan for the medium and long-term, too. It's a difficult trade-off.

Learning languages efficiently.

I'm interested in developing better language learning software.

For the movie case, do you think these would be helpful? Any other ideas?

  • Read in the subtitles file before viewing, so that vocab can be checked and learned via spaced repetition
  • Option to slow down the dialogue, with pitch-shifting to keep it from sounding weird and bassy
Rationality & Low-IQ People

You'd go pretty far just telling the audience the character was unintelligent, by giving them unintelligent status markers. Give them a blue-collar career, and very low academic achievement, while also coming from a stable family and average opportunity.

It's been a while since I watched it, but do you think Ben Affleck's character in Good Will Hunting was rational, but of limited intelligence?

There are scattered examples of this sort of "humble working man, who lives honest and true" throughout fiction.

Arthur Chu: Jeopardy! champion through exemplary rationality

Can't say I'm impressed with his reasoning there.

Interesting.

Arthur Chu: Jeopardy! champion through exemplary rationality

It doesn't seem to me that he has that any more than ther Jeopardy! contenders.

What are you working on? January 2014

In ML, everyone is engaging with the academics, and the academics are doing a great job of making that accessible, e.g. through MOOCs. ML is one of the most popular targets of "ongoing education", because it's popped up and it's a useful feather to have in your cap. It extends the range of programs you can write greatly. Many people realise that, and are doing what it takes to learn. So even if there are some rough spots in the curriculum, the learners are motivated, and the job gets done.

The cousin of language processing is computer vision. The problem we have as academics is that there is a need to communicate current best-of-breed solutions to software engineers, while we also communicate underlying principles to our students and to each other.

If you look at nltk, it's really a tool for teaching our grad students. And yet it's become a software engineering tool-of-choice, when it should never have been billed as industrial strength at all. Check out the results in my blog post:

  • NLTK POS tagger: 94% accuracy, 236s
  • My tagger: 96.8% accuracy, 12s

Both are pure Python implementations. I do no special tricks; I just keep things tight and simple, and don't pay costs from integrating into a large framework.

The problem is that the NLTK tagger is part of a complicated class hierarchy that includes a dictionary-lookup tagger, etc. These are useful systems to explain the problem to a grad student, but shouldn't be given to a software engineer who wants to get something done.

There's no reason why we can't have a software package that just gets it done. Which is why I'm writing one :). The key difference is that I'll be shipping one POS tagger, one parser, etc. The best one! If another algorithm comes out on top, I'll rip out the old one and put the current best one in.

That's the real difference between ML and NLP or computer vision. In NLP, we really really should be telling people, "just use this one". In ML, we need to describe a toolbox.

What are you working on? January 2014

I'm currently a post-doc doing language technology/NLP type stuff. I'm considering quitting soon to work full time on a start-up. I'm working on three things at the moment.

  • The start-up is a language learning web app: http://www.cloze.it . What sets it apart from other language-learning software is my knowledge of linguistics, proficiency with text processing, and willingness to code detailed language-specific features. Most tools want to be as language neutral as possible, which limits their scope a lot. So they tend to all have the same set of features, centred around learning basic vocab.

  • Something that's always bugged me about being an academic is, we're terrible at communicating to people outside our field. This means that whenever I see a post using an NLP tool, they're using a crap tool. So I wrote a blog post explaining a simple POS tagger that was better than the stuff in e.g. nltk (nltk is crap): http://honnibal.wordpress.com/2013/09/11/a-good-part-of-speechpos-tagger-in-about-200-lines-of-python/ The POS tagger post has gotten over 15k views (mostly from reddit), so I'm writing a follow up about a concise parser implementation. The parser is 500 lines, including the tagger, and faster and more accurate than the Stanford parser (the Stanford parser is also crap).

  • I'm doing minor revisions for a journal article on parsing conversational speech transcripts, and detecting disfluent words. The system gets good results when run on text transcripts. The goal is to allow speech recognition systems to produce better transcripts, with punctuation added, and stutters etc removed. I'm also working on a follow up paper to that one, with further experiments.

Overall the research is going well, and I find it very engaging. But I'm at the point where I have to start writing grant applications, and selling software seems like a much better expected-value bet.

Load More