Sorted by New

Wiki Contributions


Paper reading as a Cargo Cult

A quick example of how paper reading works in my research:

2017: Cyclegan comes out, and produces cool pictures of zebras and horses. I skim the paper because it seems cool, file away the concept, but don't make an effort to replicate the results because in my experience GANs are obnoxious to train

2018: "Which Training Methods for GANs do actually Converge?" comes out, but even though it contains the crucial insight to making GAN's trainable, I don't read it because it's not very popular- I never see it

2019: Stylegan comes out, and cites "Which Training Methods for GANs do actually Converge?" I read both papers, mostly forget stylegan because it seems like a "we have big gpu do good science" paper, but am very impressed with "Which Training Methods for GANs do actually Converge?" and take a day or two to replicate it. 

2020?: Around this time I also read all of gwern's anime training exploits, and update my priors towards "maybe large gans are actually trainable."

2022: I need to convert unlabeled dxa images into matching radiographs as part of a larger project. I'm generally of the opinion that GANs aren't actually useful, but the problem matches the problem solved by cyclegan exactly, and I'm out of options. I initally try the open source cyclegan codebase, but as expected it's wildly unstable and miserable. I recall that "Which Training Methods for GANs do actually Converge?" had pretty strong theory backing up gradient penalties on the discriminator, and I was able to replicate their experiments, so I dust off that replicating code, verify that it still works, add a cycle consistency loss, and am able to translate my images. Image translator in hand, I slog back into the larger problem.


What does this have to do with the paper reading cargo cult?

- Papers that you can replicate by downloading a code base are useful, but papers that you can replicate from the text without seeing code are solid gold. If there are any paper reading clubs out there that ask the presenter to replicate the results without looking at the author's code, I would love to join- not just because the replication is valuable, but because it would narrow down the kinds of papers presented in a valuable way.

- Reading all the most hyped GAN papers, which is basically what I did, would probably not get me an awesome research result in the field of GANs. However, it served me pretty well as a researcher in an adjacent field. In particular, the obscure but golden insight eventually filtered its way into the citations of the hyped fluffy flagship paper. For alignment research, hanging out in a few paper reading groups that are distantly related to alignment should be useful, even if an alignment research group isn't useful.

- I had to read so many papers to come across 3 useful ones for this problem. However, I retain the papers that haven't been useful yet- there's decent odds that I've already read the paper that I'll need to overcome the next hurdle. 

- This type of paper reading, where I gather tools to engineer with, initially seems less relevant for fundamental concepts research like alignment. However, your general relativity example suggests that Einstein also had a tool gathering phase leading up to relativity, so ¯\_(ツ)_/¯

Transformer language models are doing something more general

There are two ways a large language model transformer learns: type 1, the gradient descent process, which certainly does not learn information efficiently, taking billions of examples,  and then type 2, the mysterious in-episode learning process, where a transformer learns from ~ 5 examples in an engineered prompt to do a 'new' task. I think the fundamental question is whether type 2 only works if the task to be learned is represented in the original dataset, or if it generalizes out of distribution. If it truly generalizes, then the obvious next step is to somehow skip straight to type 2 learning.

Ghosts in the Machine

I tutored college students who were taking a computer programming course. A few of them didn't understand that computers are not sentient.  More than one person used comments in their Pascal programs to put detailed explanations such as, "Now I need you to put these letters on the screen."  I asked one of them what the deal was with those comments. The reply:  "How else is the computer going to understand what I want it to do?"  Apparently they would assume that since they couldn't make sense of Pascal, neither could the computer.

There's been a phase change with the release of copilot, where this suddenly appears to work-- at least, for tasks like putting letters on the screen or assembling cookie recipes. "Waiter, there's a ghost in my machine!"

Half-baked AI Safety ideas thread

To steelman, I'd guess this idea applies in the hypothetical where GPT-N gains general intelligence and agency (such as via a mesa-optimizer) just by predicting the next token. 

All AGI safety questions welcome (especially basic ones) [monthly thread]

Lesswrong has a [trove of thought experiments](https://www.lesswrong.com/posts/PcfHSSAMNFMgdqFyB/can-you-control-the-past) about scenarios where arguably the best way to maximize your utility is to verifiably (with some probability) modify your own utility function, starting with the prisoner's dilemma and extending to games with superintelligences predicting what you will do and putting money in boxes etc.

These thought experiments seem to have real world reflections: for example, voting is pretty much irrational under CDT, but paradoxically the outcomes of elections correlate with the utility functions of people who vote, and people who grow up in high trust societies do better than people who grow up in low trust societies, even though defecting is rational.

In addition, humans have an astonishing capability for modifying our own utility functions, such as by joining religions, gaining or losing empathy for animals, etc.

Is it plausible that we could analytically prove that under a training environment rich in these sorts of scenarios, an AGI that wants to maximize an initially bad utility function would develop the capability to verifiably (with some probability) modify it's own utility function like people do in order to survive and be released into the world?

GPT-3 Catching Fish in Morse Code

I am reminded of the classic "Oh say it again Dexter" "Omelette du fromage"

It’s Probably Not Lithium

I mean, obviously the causal chain of weight gain is often going to go through caloric intake, but that doesn't make caloric intake the root cause. For example, birth control pills, stress, and soda machines in schools all cause weight gain via increased caloric intake, but are distinct root causes.

[Link] OpenAI: Learning to Play Minecraft with Video PreTraining (VPT)

This generates a decent approximation of the distribution of human actions in an open world situation. Is it usable for empirical quantillizer experiments?