eeegnu — LessWrong

These are great points, and ones which I did actually think about when I was brainstorming this idea (if I understand them correctly.) I intend to write out a more thorough post on this tomorrow with clear examples (I originally imagined this as extracting deeper insights into chess), but to answer these:

I did think about these as translators for the actions of models into natural language, though I don't get the point about extracting things beyond what's in the original model.
I mostly glossed over this part in the brief summary, and the motivation I had for it comes from how (unexpectedly?) it works for GAN's to just start with random noise, and in the process the generator and discriminator both still improve each other.
My thoughts here were for the explainer models update error vector to come from judging the learner model on new unseen tasks without the explanation (i.e. how similar are they to the original models outputs.) In this way the explainer gets little benefit from just giving the answer directly, since the learner will be tested without it, but if the explanation in any way helps the learner learn, it'll improve its performance more (this is basically what the entire idea hinges on.)

AGI Ruin: A List of Lethalities

eeegnu3y100

They probably do not know where the real difficulties are, they probably do not understand what needs to be done, they cannot tell the difference between good and bad work, and the funders also can't tell without me standing over their shoulders evaluating everything, which I do not have the physical stamina to do.

This was the sentiment I got after applying to the LTFF with an idea. Admittedly, I couldn't really say whether my idea had been tried before, or wasn't obviously bad, but my conversation basically boiled down to whether I wanted to use this project as a way to grow myself in the field, rather than any particular merits/faults of the idea itself. My motivation was really about trying a cool idea that I genuinely believed could practically improve AI safety if successful, while ethically I couldn't commit to wanting to stay in the field even if it (likely?) failed since I like to go wherever my ideas take me.

Since it may be a while before I personally ever try out the idea, the most productive thing I can do seems to be to share it. It's essentially an attempt at a learning algorithm which 'forces' a models weights to explain the reasoning/motivations behind its actions. The process of training looked kind of like a GAN, with the original models inner-layer outputs being a feature vector. Some GPT-3-esque pretrained model learns to convert this feature vector into tokens (at first as random gibberish), to try and train another GPT3-esque model to perform the actions of the original model (i.e. given the same inputs as that model, plus an explanation on what it should do from the other GPT3-esque model.) The idea basically being that explanations that more correspond to the right idea will cause the 'learner' model to improve faster, which acts as feedback to the 'teacher' model to let it know that its explanations are getting better. The end of this procedure ideally is a way to get as readable text the exact reasoning behind any action.

In my mind this kind of procedure avoids many of the problems in this list. You end up not relying on trusting the output of the model, and instead treat it as a static blackbox which you sample inner layer outputs from until this procedure produces an effective copy of itself with a mechanism for obtaining its inner reasoning. If it considers something nefarious to be the optimal way of doing something, but wants to hide it from its outputs, it'll still have to explain that to the 'learner' model for it to better mimic the original model. If any of this just utterly misses one of the points above, I'd like to hear it, and if anyone's interested in hearing more about this idea, I'll make a post with the rest of my original submission and other thoughts that couldn't fit into the word limit for LTFF submissions.

What's The Best Place to Look When You Have A Question About x?

Answer by eeegnuMay 25, 202220

For competitive programming questions, codeforces.com. It has a large audience of highly skilled competitive programmers, and someone will normally help you if it's an interesting question.

What DALL-E 2 can and cannot do

eeegnu3y220

I'd rate these highly, there are many forms of anomalocarids (https://en.m.wikipedia.org/wiki/Radiodonta#/media/File%3A20191201_Radiodonta_Amplectobelua_Anomalocaris_Aegirocassis_Lyrarapax_Peytoia_Laggania_Hurdia.png) and it looks to have picked a wide variety aside from just candensis, but I'm thoroughly impressed that it got the form right in nearly all 10.

What DALL-E 2 can and cannot do

eeegnu3y180

A prompt i'd love to see: "Anomalocaris Canadensis flying through space." I'm really curious how well it does with an extinct species which has very little existing artistic depictions. No text->image model i've played with so far has managed to create a convincing anomalocaris, but one interestingly did know it was an aquatic creature and kept outputting lobsters.

Lessons After a Couple Months of Trying to Do ML Research

eeegnu4y30

Never delete code or results!!!

There's a tool I've been interested in using (if it exists), basically a jupyter notebook, but which saves all outputs to tmp files (possibly truncating at the 1MB mark or something), and which maintains a tree history of the state of cells (i.e. if you hit ctrl-z a few times and start editing, it creates a branch.) Neither of these are particularly memory heavy, but both would have saved me in the past if they were hidden options to restore old state.

I'd also add that if you had a bug, and it took real effort/digging to find an online solution, archive the link to that solution in a list (preferably with date tags.) This has been preferable for me over painstakingly re-searching / trying to search through browser history when needed.

On continuous decision boundaries

eeegnu4y20

I'll probably end up thinking about this in the background for a while, and jotting down any interesting cases in case they can accumulate into a nice generalizable thing (or maybe I'll stumble upon someone else who's made such an analysis before.)

Where I see your example sharing a common idea is that one party makes what appears to be a suboptimal decision (e.g. if they just wanted to attract top talent, a salary at the top of the salary spectrum would suffice), and it leaves the other party to infer what might be the true reasoning behind the decision that would lead to it being an optimal decision (i.e. it assumes the other party is rational.)

Another case I've seen recently was in a thread discussing the non-shopper problem.

Another possible factor is that when people are unable to evaluate quality directly, they use price as a proxy.
You don't want a crappy realtor who gives you bad information. But you can't tell which realtors are good and which aren't, because you don't know enough about real estate. So, you think, "You get what you pay for" and go with one that charges more, figuring that the higher price corresponds with higher quality.

Here an exchange also happens, except this time it's your $ for something you have little domain knowledge over. I imagine the peak probability for someone concerned about quality but without a good way to assess it would fall somewhere around a standard deviation above the mean price.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments