All of alexlyzhov's Comments + Replies

Wow, Zvi example is basically what I've been doing recently with hyperbolic discounting too after I've spent a fair amount of time thinking about Joe Carlsmith—Can you control the past. It seems to work. "It gives me a lot of the kind of evidence about my future behavior that I like" is now the dominant reason behind certain decisions.

How much time do you expect the form, the coding test, and the interview to take for an applicant?

30 min, 45 min, 20-30 min (respectively)

This idea tries to discover translations between the representations of two neural networks, but without necessarily discovering a translation into our representations.


I think this has been under investigation for a few years in the context of model fusion in federated learning, model stitching, and translation between latent representations in general.

Relative representations enable zero-shot latent space communication - an analytical approach to matching representations (though this is a new work, it may be not that good, I haven't checked)

Git Re-B... (read more)

1Maxwell Clarke1mo
Thanks for these links, especially the top one is pretty interesting work

I don't expect Putin to use your interpretation of "d" instead of his own interpretation of it which he is publicly advertising whenever he has a big public speech on the topic.

From the latest speech:

> In the 80s they had another crisis they solved by "plundering our country". Now they want to solve their problems by "breaking Russia".

This directly references an existential threat.

From the speech a week ago:

> The goal of that part of the West is to weaken, divide and ultimately destroy our country. They are saying openly now that in 1991 they managed... (read more)

From my experience of playing VR games on mobile devices (Quest 1 and Quest 2), the majority of in-game characters look much better than this and it doesn't impact the framerate at all. This seems like a 100% stylistic choice.

"... the existing literature on the influence of dopamine enhancing agents on working memory provides reasonable support for the hypothesis that augmenting dopamine function can improve working memory."
Pharmacological manipulation of human working memory, 2003

I'd be really interested in a head-to-head comparison with R on a bunch of real-world examples of writing down beliefs that were not selected to favor either R or Squiggle. R because at least in part specifying and manipulating distributions seems to require less boilerplate than in Python.

I wonder what happens when you ask it to generate
> "in the style of a popular modern artist <unknown name>"
> "in the style of <random word stem>ism".
You could generate both types of prompts with GPT-3 if you wanted so it would be a complete pipeline.

"Generate conditioned on the new style description" may be ready to be used even if "generate conditioned on an instruction to generate something new" is not. This is why a decomposition into new style description + image conditioned on it seems useful.

If this is successful, then more of the... (read more)

I wonder if the macronutrient rates shifted. This would influence the total calories you end up with because absorption rates are different for different macronutrients. How the food is processed also influences absorption (as well as the total amount of calories that may not be reflected on the package).

If these factors changed, calories today don't mean exactly the same thing as calories in 1970.

Since FDA allows a substantial margin of error for calories, maybe producers also developed a bias that allows them to stay within this margin of error but show fewer calories on the package?

Maybe this is all controlled for in studies, dunno, I just did a couple of google searches and had these questions.

2Ege Erdil5mo
I have no clue about this, unfortunately.

I could imagine that OpenAI getting top talent to ensure their level of research achievements while also filtering people they hire by their seriousness about reducing civilization-level risks is too hard. Or at least it could easily have been infeasible 4 years ago.

I know a couple of people at DeepMind and none of them have reducing civilization-level risks as one of their primary motivations for working there, as I believe is the case with most of DeepMind.

I have an argument for capabilities research being good but with different assumptions. The assumption that's different is that we would progress rapidly towards AGI capabilities (say, in 10 years).

If we agree 95% of progress towards alignment happens very close to the AGI, then the duration of the interval between almost-AGI and AGI is the most important duration.

Suppose the ratio of capabilities research to alignment research is low (probably what most people here want). Then AI researchers and deployers will have an option say "Look, so many resources w... (read more)

  • When you say that coherent optimizers are doing some bad thing, do you imply that it would always be a bad decision for the AI to make the goal stable? But wouldn't it heavily depend on what other options it thinks it has, and in some cases maybe worth the shot? If such a decision problem is presented to the AI even once, it doesn't seem good.
  • The stability of the value function seems like something multidimensional, so perhaps it doesn't immediately turn into a 100% hardcore explicit optimizer forever, but there is at least some stabilization. In particula
... (read more)

Every other day I have a bunch of random questions related to AI safety research pop up but I'm not sure where to ask them. Can you recommend any place where I can send these questions and consistently get at least half of them answered or discussed by people who are also thinking about it a lot? Sort of like an AI safety StackExchange (except there's no such thing), or a high-volume chat/discord. I initially thought about LW shortform submissions, but it doesn't really look like people are using the shortform for asking questions at all.

2Jan Czechowski6mo
There's an AI safety camp slack with #no-stupid-questions channel. I think people stay there even after the camp ends (I'm still there although this year edition ended last week). So you can either apply for next years edition (which I very much recommend!) or maybe contact organizers if they can add you without you being AISC participant/alumni? Just a disclaimer, I'm not sure how active this slack is between camps, and it might be that lot of people leave after the camp ends.
The closest thing to an AI safety StackExchange is the stampy wiki [], with loads of asked & answered questions. It also has a discord [].
The [] discord [] has two alignment channels with reasonable volume (#alignment-general and #alignment-beginners). These might be suitable for your needs.

But the mere fact that one network may be useful for many tasks at once has been extensively investigated since 1990s.

To receive epistemic credit, make sure that people would know you haven't made all possible predictions on a topic this way and then revealed the right one after the fact. You can probably publish plaintext metadata for this.

An update on Israel:

> Citizenship is typically granted 3 months after arrival; you can fill out a simple form to waive this waiting period, however.
I think it's not the case, because you receive an internal ID of a citizen immediately after a document check, but they only give you a passport you can use for visas after 3 months (which you can also spend outside the country).
Waiving the waiting period is possible in 2022, but you have to be smart about it and go to exactly the right place to do it (because many local governments are against it).

> Isra... (read more)

Actually, the Metaculus community prediction has a recency bias:
> approximately sqrt(n) new predictions need to happen in order to substantially change the Community Prediction on a question that already has n players predicting.

In this case, n=298, the prediction should change substantially after sqrt(n)=18 new predictions (usually it takes up to a few days). Over the past week, there were almost this many predictions and the AGI community median has shifted 2043 -> 2039, and the 30th percentile is 8 years.

No disagreements here; I just want to note that if "the EA community" waits too long for such a pivot, at some point AI labs will probably be faced with people from the general population protesting because even now a substantial share of the US population views the AI progress in a very negative light. Even if these protests don't accomplish anything directly, they might indirectly affect any future efforts. For example, an EA-run fire alarm might be compromised a bit because the memetic ground would already be captured. In this case, the concept of "AI r... (read more)

I’m not sure I would agree. The post you linked to is titled “A majority of the public supports AI development.” Only 10% of the population is strongly opposed to. You’re making an implicit assumption that the public is going to turn against the technology in the next couple of years but I see no reason to believe that. In the past, public opinion really only turns against technology dolloping a big disaster. But we may not see a big AI induced disaster before a change in public opinion will be irrelevant to AGI

ICML 2022 reviews dropped this week.

"What if outer space were udon" (CLIP guided diffusion did really well, this is cherry-picked though:

"colourless green ideas sleep furiously"

This is a great example of how even a single iteration on the prompt can vastly improve the results. Here are the results when using your quotes exactly: Pretty dreadful! But here they are, with the exact same prompt, except with ", digital art" appended to it:

Are PaLM outputs cherry-picked?

I reread the description of the experiment and I'm still unsure.

The protocol is on page 37 goes like this:
- the 2-shot exemplars used for few-shot learning were not selected or modified based on model output. I infer this from the line "the full exemplar prompts were written before any examples were evaluated, and were never modified based on the examination of the model output".
- greedy decoding is used, so they couldn't filter outputs given a prompt.

What about the queries (full prompt without the QAQA few-shot data part)? A... (read more)

2[comment deleted]8mo

These games are really engaging for me and haven't been named:

Eleven Table Tennis. Ping-pong in VR (+ multiplayer and tournaments):

Racket NX. This one is much easier but you still move around a fair bit. The game is "Use the racket to hit the ball" as well.

Synth Riders. An easier and more chill Beat Saber-like game:

Holopoint. Archery + squats, gets very challenging on later levels:


Some gameplay videos for excellent games that have been named:

Beat Saber. "The VR game". You can load songs from the community library using mods.

Thrill of the Fight (boxin... (read more)

You can buy fladrafinil or flmodafinil without any process (see reddit for reports, seems to work much better than adrafinil)

One thing you probably won't find in an evidence review is that it feels more pleasant for me to type in Colemak rather than in QWERTY years after I made the switch. That's a pretty huge factor as well considering that we put so many hours into typing.

I would also highlight this as seemingly by far the most wrong point. Consider how many Omicron cases we now have and we still don't know for sure it's significantly less severe. Now consider how many secret cases in humans infected with various novel strains you're working with you would need to enact in a controlled environment to be confident enough that a given strain is less severe and thus it makes sense to release it.

Does anyone have a good model of how do they reconcile

1) a pretty large psychosis rate in this survey, a bunch of people in saying that their friends got mental health issues after using psychedelics, anecdotal experiences and stories about psychedelic-induced psychosis in the general cultural field


2) Studies finding no correlation, or, ... (read more)

My mental model for the difference between the two results is based on the following: 1) the studies by Krebs and Johansen are analysis based on the "National Survey on Drug Use and Health (...), randomly selected to be representative of the adult population in the United States". 2) ACX readers population is not representative of the US population, in fact, it might be skewed in some dimensions that are very relevant here. 3) there are significant differences in the fraction of each sample that report psychedelic use 3.1) in the case of Krebs and Johansen (2013, 2015), it is ~13% reporting lifetime psychedelic use, while in the subsample of ACX readers survey considered in this report it is ~100%. One important aspect here tying this together is that I would assume ACX readers do not have the same distribution of genes associated with intelligence as the general population, and there has been evidence that there is an overlap of those genes and the genes associated with bipolar disorder ( This genetic overlap can explain a higher susceptibility of psychotic-like experiences with higher intelligence, even if there is no particular diagnose. Furthermore, by considering the multiple types of psychotic disorder, [] has found the prevalence in the general population to be ~3%, which does not fall too far from the 4.5% that responded a firm "yes" to the survey.

This study at least didn't ask about the length of the psychotic episode, so it seems compatible with the users having had short-term psychotic episodes that didn't cause long-term damage.

Speculatively, a short-term psychosis could even be part of what causes long-term mental health benefits, if e.g. psychedelics do it via a relaxing of priors and the psychotic episode is the moment when they are the most relaxed before stabilizing again, in line with the neural annealing analogy:

The hypothesized flattening of the brain’s (variational free) energy landsc

... (read more)

"Training takes between 24 and 48 hours for most models"; I assumed both are trained within 48 hours (even though this is not precise and may be incorrect).

Ohh OK I think since I wrote "512 TPU cores" it's 512x512, because in Appendix C here they say it corresponds to 512x512.

Deep or shallow version?

It should be referenced here in Figure 1:

"I have heard that they get the details wrong though, and the fact that they [Groq] are still adversing their ResNet-50 performance (a 2015 era network) speaks to that."

I'm not sure I fully get this criticism: ResNet-50 is the most standard image recognition benchmark and unsurprisingly it's the only (?) architecture that NVIDIA lists in their benchmarking stats for image recognition as well:

This is a very neat idea, is there any easy way to enable this for Android and Google Calendar notifications? I guess not

Yep, the first google result http://xn--80akpciegnlg.xn--p1ai/preparaty-dlya-kodirovaniya/disulfiram-implant/ (in Russian) says that you use an implant with 1-2g of the substance for up to 5-24 months and that "the minimum blood level of disulfiram is 20 ng/ml; ". This paper says "Mild effects may occur at blood alcohol concentrations of 5 to 10 mg/100 mL."

Ethereum above 0.05 BTC: 70%

This already happened today (a day after this post).

I would have put this waay higher due to value proposition of ethereum + massive ethereum ecosystem + the fact that it hasn't rallied that much yet against BTC compared to its 2017 values + bright future plans for ethereum + competitors forced to integrate with ethereum and lacking some of its properties. IDK if these are objectively good reason for expecting growth but they are there in my personal model.

The prediction about CV doesn't seem to have aged that well in my view. Others are going fairly well!

gwern has recently remarked that one cause of this is supply and demand disruptions and this may be a temporary phenomenon in principle.

I appreciate questioning of my calculations, thanks for checking!

This is what I think about the previous avturchin calculation: I think that may have been a misinterpretation of DeepMind blogpost. In the blogpost they say "The AlphaStar league was run for 14 days, using 16 TPUs for each agent". But I think it might not be 16 TPU-days for each agent, it's 16 TPU for 14/n_agent=14/600 days for each agent. And 14 days was for the whole League training where agent policies were trained consecutively. Their wording is indeed not very clear but you can look at t... (read more)

Probably that []:

My calculation for AlphaStar: 12 agents * 44 days * 24 hours/day * 3600 sec/hour * 420*10^12 FLOP/s * 32 TPUv3 boards * 33% actual board utilization = 2.02 * 10^23 FLOP which is about the same as AlphaGo Zero compute.

For 600B GShard MoE model: 22 TPU core-years = 22 years * 365 days/year * 24 hours/day * 3600 sec/hour * 420*10^12 FLOP/s/TPUv3 board * 0.25 TPU boards / TPU core * 0.33 actual board utilization = 2.4 * 10^21 FLOP.

For 2.3B GShard dense transformer: 235.5 TPU core-years = 2.6 * 10^22 FLOP.

Meena was trained for 30 days on a TPUv3 pod with 2048 c... (read more)

What is the GShard dense transformer you are referring to in this post?
A previous calculation [] on LW gave 2.4 x 10^24 for AlphaStar (using values from the original alphastar blog post [] ) which suggested that the trend was roughly on track. The differences between the 2 calculations are (your values first): Agents: 12 vs 600 Days: 44 vs 14 TPUs: 32 vs 16 Utilisation: 33% vs 50% (I think this is just estimated in the other calculation) Do you have a reference for the values you use?

Here's a list of papers related to reasoning and RL for language models that were published in fall 2020 and that have caught my eye - you may also find it useful if you're interested in the topic.

Learning to summarize from human feedback - finetune GPT-3 to generate pieces of text to accomplish a complex goal, where performance ratings are provided by humans.
Keep CALM and Explore: Language Models for Action Generation in Text-based Games - an instance of the selector approach where a selector chooses between generated text candidates, similarly to "GeDi... (read more)

Suppose some variant like the SA one is vaccine-evading and some people will have to vaccinate a second time with an adapted vaccine. What are our priors for the safety of vaccinating repeatedly this way (either with the same or different delivery methods)? If we have two vaccines that are pretty safe, are side effects of vaccinating with the first one and then vaccinating with the second, similar one almost surely on the order of side effects from using just one kind of vaccine?

I would expect the prior to be to end up with something similar to the flu vaccine, which we try to get everyone to take approximately yearly and have more safety concerns about people not taking it.

This is the link to Yudkowsky discussion of concept merging with the triangular lightbulb example:

Generated lightbulb images:

Given that the details in generated objects are often right, you can use superresolution neural models to upscale the images to a needed size.

On prior work: they cited l-lxmert (Sep 2020) and TReCS (Nov 2020) in the blogpost. These are the baselines it seems.

The quality of objects and scenes there is far below the new model. They are often just garbled and not looking quite right.

But more importantly, the best they could sometimes understand from the text is something like "a zebra is standing in the field", i.e. the object and the background, all the other stuff was lost. With this model, you can actually use much more language fea... (read more)

Great approach. I use it in a slightly different way - I have a rule that each time I open a website from a list, I have to report it to my assistant, and I have to report a good enough reason. I also use website blockers on all platforms as an additional cost (Block Site on Chrome, Screen Timer on Android). But website blockers don't work that well on their own - I sometimes have to visit those websites for legitimate reasons and so I have to disable a blocker, and after a while I slip and the bar for disabling them gets too low.

Super thoughtful post!

I get the feeling that I'm more optimistic about post-hoc interpretability approaches working well in the case of advanced AIs. I'm referring to the ability of an advanced AI in the form of a super large neural network-based agent to take another super large neural network-based agent and verify its commitment successfully. I think this is at least somewhat likely to work by default (i.e. scrutinizing advanced neural network-based AIs may be easier than obfuscating intentions). I also think this may potentially not require that much i... (read more)

I agree that the difference in datasets between 1BW and PTB is making precise comparisons impossible. Also, the "human perplexity = 12" on 1BW is not measured directly. It's extrapolated from their constructed "human judgement score" metric based on values of both "human judgement score" and perplexity metrics for pre-2017 language models, with authors noting that the extrapolation is unreliable.

With enough iterations, we could end up with a powerful self replicating memetic agent with arbitrary goals and desires coordinating with copies and variations of itself to manipulate humans and gain influence in the real world.

I felt initially cold towards the whole article, but now I mostly agree.

The goals of text agents might be programmable by humans directly (consider the economic pressure towards creating natural language support agents / recommendation systems / educators / etc). Prompts in their current form 1) only have significant influence over... (read more)