Wiki Contributions


From my experience of playing VR games on mobile devices (Quest 1 and Quest 2), the majority of in-game characters look much better than this and it doesn't impact the framerate at all. This seems like a 100% stylistic choice.

"... the existing literature on the influence of dopamine enhancing agents on working memory provides reasonable support for the hypothesis that augmenting dopamine function can improve working memory."
Pharmacological manipulation of human working memory, 2003

I'd be really interested in a head-to-head comparison with R on a bunch of real-world examples of writing down beliefs that were not selected to favor either R or Squiggle. R because at least in part specifying and manipulating distributions seems to require less boilerplate than in Python.

I wonder what happens when you ask it to generate
> "in the style of a popular modern artist <unknown name>"
> "in the style of <random word stem>ism".
You could generate both types of prompts with GPT-3 if you wanted so it would be a complete pipeline.

"Generate conditioned on the new style description" may be ready to be used even if "generate conditioned on an instruction to generate something new" is not. This is why a decomposition into new style description + image conditioned on it seems useful.

If this is successful, then more of the high-level idea generation involved can be shifted onto a language model by letting it output a style description. Leave blanks in it and run it for each blank, while ensuring generations form a coherent story.

>"<new style name>, sometimes referred to as <shortened version>, is a style of design, visual arts, <another area>, <another area> that first appeared in <country> after <event>. It influenced the design of <objects>, <objects>, <more objects>. <new style name> combined <combinatorial style characteristic> and <another style characteristic>. During its heyday, it represented <area of human life>, <emotion>, <emotion> and <attitude> towards <event>."

DALL-E can already model the distribution of possible contexts (image backgrounds, other objects, states of the object) + possible prompt meanings. An go from the description 1) to high-level concepts, 2) to ideas for implementing these concepts (relative placement of objects, ideas for how to merge concepts), 3) to low-level details. All within 1 forward pass, for all prompts! This is what astonished me most about DALL-E 1.

Importantly, placing, implementing, and combining concepts in a picture is done in a novel way without a provided specification. For style generation, it would need to model a distribution over all possible styles and use each style, all without a style specification. This doesn't seem much harder to me and could probably be achieved with slightly different training. The procedure I described is just supposed to introduce helpful stochasticity in the prompt and use an established generation conduit.

I wonder if the macronutrient rates shifted. This would influence the total calories you end up with because absorption rates are different for different macronutrients. How the food is processed also influences absorption (as well as the total amount of calories that may not be reflected on the package).

If these factors changed, calories today don't mean exactly the same thing as calories in 1970.

Since FDA allows a substantial margin of error for calories, maybe producers also developed a bias that allows them to stay within this margin of error but show fewer calories on the package?

Maybe this is all controlled for in studies, dunno, I just did a couple of google searches and had these questions.

I could imagine that OpenAI getting top talent to ensure their level of research achievements while also filtering people they hire by their seriousness about reducing civilization-level risks is too hard. Or at least it could easily have been infeasible 4 years ago.

I know a couple of people at DeepMind and none of them have reducing civilization-level risks as one of their primary motivations for working there, as I believe is the case with most of DeepMind.

I have an argument for capabilities research being good but with different assumptions. The assumption that's different is that we would progress rapidly towards AGI capabilities (say, in 10 years).

If we agree 95% of progress towards alignment happens very close to the AGI, then the duration of the interval between almost-AGI and AGI is the most important duration.

Suppose the ratio of capabilities research to alignment research is low (probably what most people here want). Then AI researchers and deployers will have an option say "Look, so many resources were put towards safety already, it's actually fine, we're employing the 2027 comprehensive robustness benchmarks, and IDA+, in fact our quality assurance team is implementing it right now, no need to worry", prompting decision-makers to relax and let it go. Almost-AGI -> AGI interval is 2 years.

On the other hand, if it's high, this may cause decision-makers to freak out when they have their almost-AGI on the table and contain the development (e.g. with regulation). This may primarily be mediated via easier-to-avoid public failures and accidents. Or by AI safety people quickly and loudly demonstrating that we don't yet have the tools to avoid even these easier-to-avoid failures. Then regulation extends the Almost-AGI -> AGI interval to 8 years.

The point is that this is 4x more time to work on 95% of safety research progress.

  • When you say that coherent optimizers are doing some bad thing, do you imply that it would always be a bad decision for the AI to make the goal stable? But wouldn't it heavily depend on what other options it thinks it has, and in some cases maybe worth the shot? If such a decision problem is presented to the AI even once, it doesn't seem good.
  • The stability of the value function seems like something multidimensional, so perhaps it doesn't immediately turn into a 100% hardcore explicit optimizer forever, but there is at least some stabilization. In particular, bottom-up signals that change the value function most drastically may be blocked.
  • AI can make its value function more stable to external changes, but it can also make it more malleable internally to partially compensate for Goodharting. The end result for outside actors though is that it only gets harder to change anything.
  • Edit: BTW, I've read some LW articles on Goodharting but I'm also not yet convinced it will be such a huge problem at superhuman capability levels - seems uncertain to me. Some factors may make it worse as you get there (complexity of the domain, dimensionality of the space of solutions), and some factors may make it better (the better you model the world, the better you can optimize for the true target). For instance, as the model gets smarter, the problems from your examples seem to be eliminated: in 1, it would optimize end-to-end, and in 2, the quality of the decisions would grow (if the model had access to the ground truth value function all along, then it would grow because of better world models and better tree search for decision-making). If the model has to check-in and use feedback from the external process (human values) to not stray off course, then as it's smarter it's discovering a more efficient way to collect the feedback, has better priors, etc.

Every other day I have a bunch of random questions related to AI safety research pop up but I'm not sure where to ask them. Can you recommend any place where I can send these questions and consistently get at least half of them answered or discussed by people who are also thinking about it a lot? Sort of like an AI safety StackExchange (except there's no such thing), or a high-volume chat/discord. I initially thought about LW shortform submissions, but it doesn't really look like people are using the shortform for asking questions at all.

Load More