# All of Marius Hobbhahn's Comments + Replies

Trends in GPU price-performance

We're currently looking deeper into how we can extrapolate this trend. Our preliminary high uncertainty estimate is that it is more likely to slow down than speed up over the foreseeable future.

Trends in GPU price-performance

The comparison lines (dotted) have completely arbitrary y-intercepts. You should only take the slope seriously.

1Leon Lang12h
That might be worth mentioning, as I wondered about the same. (I didn't realize until now that all the slope curves start at the same point on the left hand side of the figure)
Our mental building blocks are more different than I thought

The building block concept was just something that I found intuitive. It's not backed by rigorous research or intense thinking. I think they can easily be called tasks or traits or other things that relate to psychological findings. You should really think of this post as something that I found worth sharing without doing a lot of background reading.

Our mental building blocks are more different than I thought

I'm not sure either to be fair. My friend with aphantasia says it doesn't make that much of a practical difference for her. But it's hard to compare since we don't know the counterfactual.

I'm generally pretty uncertain how large the differences are but some discussions lead me to believe that they are bigger than I expected. At some point I was just like "Wait, you can't rotate the shape in your head?" or "What do you mean, you feel music?".

I think there are a ton of interesting questions to dive into. Probably a lot have already been answered by psychologists. I think the independence question is very interesting as well.

Investigating causal understanding in LLMs

I would expect the results to be better on, let's say PaLM. I would also expect it to base more of its answers on content than form.

I think there are a ton of experiments in the direction of natural story plots that one could test and I would be interested in seeing them tested.  The reason we started with relatively basic toy problems is that they are easier to control. For example, it is quite hard to differentiate whether the model learned based on form or content in a natural story context.

Overall, I expect there to be many further research projects and papers in this direction.

Eliciting Latent Knowledge (ELK) - Distillation/Summary

Thank you for the feedback. I will update the post to be more clear on imitative generalization.

Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc

I think the claim you are making is correct but it still misses a core point of why some people think that Bayes nets are more interpretable than DL.
a) Complexity: a neural network is technically a Bayes net. It has nodes and variables and it is non-cyclical. However, when people talk about the comparison of Bayes nets vs. NNs, I think they usually mean a smaller Bayes net that somehow "captures all the important information" of the NN.
b) Ontology:  When people look at a NN they usually don't know what any particular neuron or circuit does... (read more)

I strongly agree. There are two claims here. The weak one is that, if you hold complexity constant, directed acyclic graphs (DAGs; Bayes nets or otherwise) are not necessarily any more interpretable than conventional NNs because NNs are DAGs at that level. I don't think anyone who understands this claim would disagree with it.

But that is not the argument being put forth by Pearl/Marcus/etc. and arguably contested by LeCun/etc.; they claim that in practice (i.e., not holding anything constant), DAG-inspired or symbolic/hybrid AI approaches like Neural Causa... (read more)

The limits of AI safety via debate

Thanks for taking the time. I now understand all of your arguments and am convinced that most of my original criticisms are wrong or inapplicable. This has greatly increased my understanding of and confidence in AI safety via debate. Thank you for that. I updated the post accordingly. Here are the updated versions (copied from above):

Re complexity:
Update 2: I misunderstood Rohin’s response. He actually argues that, in cases where a claim X breaks down into claims X1 and X2, the debater has to choose which one is more effective to attack, i.e. it... (read more)

4Rohin Shah2mo
The limits of AI safety via debate

Thank you for the detailed responses. You have convinced me of everything but two questions. I have updated the text to reflect that. The two remaining questions are (copied from text):

On complexity: There was a second disagreement about complexity. I argued that some debates actually break down into multiple necessary conditions, e.g. if you want to argue that you played Fortnite you have to show that it is possible to play Fortnite that and then that it is plausible that you played it. The pro-Fortnite debater has to show both claims while the anti... (read more)

4Rohin Shah2mo
Thanks for making updates! No, that's not what I mean. The idea with debate is that you can have justified belief in some claim X if you see one expert (the "proponent") agree with claim X, and another equally capable expert (the "antagonist") who is solely focused on defeating the first expert is unable to show a problem with claim X. The hope is that the antagonist fails in its task when X is true, and succeeds when X is false. We only give the antagonist one try at showing a problem with claim X. If the support for the claim breaks down into two necessary subcomponents, the antagonist should choose the one that is most problematic; it doesn't get to backtrack and talk about the other subcomponent. This does mean that the judge would not be able to tell you why the other subcomponent is true, but the fact that the antagonist didn't choose to talk about that subcomponent suggests that the human judge would find that subcomponent more trustworthy than the one the antagonist did choose to talk about. I mean, the reason is "if the debater is not truthful, the opponent will point that out, and the debater will lose". This in turn depends on the central claim in the debate paper: In cases where this claim isn't true, I agree debate won't get you the truth. I agree in the "flawed physics" example if you have a short debate then deception is incentivized. As I mentioned in the previous comment, I do think deception is a problem that you would worry about, but it's only in cases where it is easier to lie than to refute the lie. I think it is inaccurate to summarize this as "debate assumes that AI is not deceptive"; there's a much more specific assumption which is "it is harder to lie than to refute a lie" (which is way more plausible-sounding to me at least than "assumes that AI is not deceptive").
The limits of AI safety via debate

Thanks for your detailed comment. Let me ask some clarifications. I will update the post afterward.

Assumption 1:

I understand where you are going but the underlying path in the tree might still be very long, right? The not-Fortnite-debater might argue that you couldn't have played Fortnite because electricity doesn't exist.  Then the Fortnite-debater has to argue that it does exist, right?

Furthermore, I don't see why it should just be one path in the tree. Some arguments have multiple necessary conditions/burdens. Why do I not have to prove... (read more)

4Rohin Shah2mo
Yes. It doesn't seem like this has to be that long, since you break down the claim into multiple subclaims and only recurse down into one of the subclaims. Again, the 1800-person doesn't have to be shown the full reasoning justifying the existence of electricity, they just have to observe that the opponent debater was unable to poke a hole in the "electricity exists" claim. If the opponent previously claimed that X, and then the debater showed actually not-X, and the opponent says "okay, sure, not-X, but what about Y", they just immediately lose the debate. That is, you tell your human judges that in such cases they should award the victory to the debater that said X. The debater can say "you're moving the goalposts" to make it really obvious to the judge. Yes! It means that there probably exists an exponential-sized tree that produces the right answer, and so debate could plausibly recreate the answer that that reasoning would come to! (I think it is first worth understanding how debate can produce the same answers as an exponential-sized tree. As a simple, clean example, debate in chess with arbitrarily intelligent players but a human judge leads to optimal play, even though if the human computed optimal play using the direct brute force approach it would be done only well after the heat death of the universe.) (Also, Figure 1 in AI Safety Needs Social Scientists [https://distill.pub/2019/safety-needs-social-scientists/] kinda gets at the "implicit tree".) You might be able to teach your dog quantum physics; it seems plausible that in a billion years you could teach your dog how to communicate, use language, have compositional concepts, apply logic, etc, and then once you have those you can explain quantum physics the way you'd explain it to a human. But I agree that debate with dog judges won't work, because the current capabilities of dogs aren't past the universality threshold [https://ai-alignment.com/of-humans-and-universality-thresholds-24b473e0c898].
Nuclear Energy - Good but not the silver bullet we were hoping for

Thank you for your expertise and criticism.

Just to be clear, we still think nuclear is good and we explicitly phrase it as a high-risk, high-reward bet against the failure of renewables. Furthermore, we very explicitly say that nuclear is better than all fossil alternatives which also implies that we would be in favor of having more nuclear.

I think our overall framing is more like: get away from fossil fuels, a bit more into nuclear and much more into renewables. If you think that is not clear from the text, we should definitely clarify this. Let me know how you understood it.

Nuclear Energy - Good but not the silver bullet we were hoping for

We discuss this in our post. We think it is a plausible story but it's not clear if decisive. Most people who are critical of the status quo don't see regulation as the primary issue but I'm personally sympathetic to it.

Nuclear Energy - Good but not the silver bullet we were hoping for

Unfortunately, we didn't find a good source for that. However, given that fossils usually don't need storage and solar+batteries are dropping exponentially in price, we think both options should be cheaper. But good estimates of that would be very welcome.

1Danny Grossman2mo
See below- I made some estimations based on literature and analyzed a current large scale project. cheers, Danny

solar+batteries are dropping exponentially in price

Pulling the data from this chart from your source:

...and fitting[1] an exponential trend with offset[2], I get:

(Pardon the very rough chart.)

This appears to be a fairly good fit[3], and results in the following trend/formula:

[4]

This is an exponentially-decreasing trend... but towards a decidedly positive horizontal asymptote.

This essentially indicates that we will get minimal future scaling, if any. $37.71/MWh is already within the given range. For reference, he... (read more) [$20K in Prizes] AI Safety Arguments Competition

Random side note: GPT-3 seems to be able to generate decent one liners.

Generate one-liners describing the dangers of AI. An example is "Inventing machines that are smarter than us is playing with fire."

1. Machines that are smarter than us could easily become our masters.

2. If we're not careful, AI could spell the end of humanity as we know it.

3. AI could be used to create weapons of mass destruction that could devastate the planet.

4. AI could be used to create powerful robots that could enslave humans.

5. AI could be used to create artificial intellig... (read more)

1Trevor12mo
1. Machines that are smarter than us could easily become our masters. [All it takes is a single glitch, and they will outsmart us the same way we outsmart animals.] 2. If we're not careful, AI could spell the end of humanity as we know it. [Artificial intelligence improves itself at an exponential pace, so if it speeds up there is no guarantee that it will slow down until it is too late.] 3. AI could be used to create weapons of mass destruction that could devastate the planet. x 4. AI could be used to create powerful robots that could enslave humans. x 5. AI could one day be used to create artificial intelligence [an even smarter AI system] that could turn against its creators [if it becomes capable of outmaneuvering humans and finding loopholes in order to pursue it's mission.] 6. AI usher in a new era of cyber-warfare that could cripple society x 7. AI could create self-replicating robots that could eventually consume all resources on Earth x 8. AI could [can one day] be used to create [newer, more powerful] AI [systems] that could eventually surpass human intelligence and take over the world [behave unpredictably]. 9. AI technology could eventually be used to create a global surveillance state where everyone is constantly watched and monitored x
3jcp292mo
Good idea! I could imagine doing something similar with images generated by DALL-E.
6Mitchell Reynolds2mo
I had a similar thought to prompt GPT-3 for one liners or to summarize some article (if available). I think involving the community to write 500-1000 winning submissions would have the positive externality of non-winners to distill/condense their views. My exploratory idea is that this would be instrumentally useful when talking with those new to AI x-risk topics.
Causality, Transformative AI and alignment - part I

Right, but Reichenbach's principle of common cause doesn't tell you anything about how they are causally related? They could just be some nodes in a really large complicated causal graph. So I agree that we can assume causality somehow but we are much more interested in how the graph looks like, right?

1tailcalled5mo
Not necessarily? Reality is really really big. It would be computationally infeasible to work with raw reality. Rather, you want abstractions that cover aggregate causality in a computationally practical way, throwing away most of the causal details. See also this: https://www.lesswrong.com/posts/Gkv2TCbeE9jjMHXKR/reductionism-is-not-the-ultimate-tool-for-causal-inference [https://www.lesswrong.com/posts/Gkv2TCbeE9jjMHXKR/reductionism-is-not-the-ultimate-tool-for-causal-inference]
Causality, Transformative AI and alignment - part I

You state in the first comment that they can be given causal justification.  As far as I understand you argue with covariances above. Can you elaborate on what this causal justification is?

1tailcalled5mo
In a causal universe, if you observe things in different places that correlate with each other, they must have a common cause. That's the principle VAEs/triplet losses/etc. can be understood as exploiting.
Causality, Transformative AI and alignment - part I

I'm very interested in a collaboration!! Let's switch to DMs for calls and meetings.

What’s the backward-forward FLOP ratio for Neural Networks?

1. We could care about the theoretical number of FLOP a method might use independent of architecture, exact GPU, etc. This might be useful to compare methods independent of which exact setup is the current flavor of the month. It might also be useful to compare current methods against methods from 5 years ago or 5 years from now. Then the setup we describe in the post seems to be the best fit.
2. We could care about the actual FLOP count that a... (read more)

How to measure FLOP/s for Neural Networks empirically?

We will add a second blog post in which we discuss how accurate this rule is under different conditions. It looks like it depends on many factors such as batch size, type of parameters, depth, etc.

What are red flags for Neural Network suffering?

This is definitely a possibility and one we should take seriously. However, I would estimate that the scenario of "says it suffers as deception" needs more assumptions than "says it suffers because it suffers". Using Occam's razor, I'd find the second one more likely. The deception scenario could still dominate an expected value calculation but I don't think we should entirely ignore the first one.

What are red flags for Neural Network suffering?

It's definitely not optimal. But our goal with these questions is to establish whether GPT-3 even has a consistent model of suffering. If it answers these questions randomly, it seems more likely to me that it does not have the ability to suffer than if it answered them very consistently.

A Guide for Productivity

I think I might make a short version when I feel like I have more practical experience with some of the things I'm suggesting. Maybe in half a year or so. Thanks for the kind feedback and suggestions :)

A Guide for Productivity

Thank you for the valuable feedback. I'll try to improve that more in future posts and add more paragraphs here.

How to Sleep Better

I don't think whoop influenced my rhythm that much. But I had quite a steady rhythm already, so it wasn't too much of a problem. I think I mostly profited from whoop by having an assessment for my quality of sleep. This made it much easier for me to decide how to prioritize my work, e.g. when I should do intellectually intense stuff and when light work.

I haven't tried any of your other suggestions, e.g. CBT, so I cannot comment on them. But thank you for sharing your experiences.