All of Raemon's Comments + Replies

I'm specifically talking about the reference class of nuclear and bioweapons, which do sometimes involve invasion or threat-of-invasion of sovereign states. I agree that's really rare, something we should not do lightly. 

But I don't think you even need Eliezer-levels-of-P(doom) to think the situation warrants that sort of treatment. The most optimistic people I know of who seem to understand the core arguments say things like "10% x-risk this century", which I think is greater than x-risk likelihood from nuclear war.

8Rob Bensinger11h
I agree with this. I find it very weird to imagine that "10% x-risk this century" versus "90% x-risk this century" could be a crux here. (And maybe it's not, and people with those two views in fact mostly agree about governance questions like this.) Something I wouldn't find weird is if specific causal models of "how do we get out of this mess" predict more vs. less utility for state interference. E.g., maybe you think 10% risk is scarily high and a sane world would respond to large ML training runs way more aggressively than it responds to nascent nuclear programs, but you also note that the world is not sane, and you suspect that government involvement will just make the situation even worse in expectation.
2Matthew Barnett11h
FWIW I also have >10% credence on x-risk this century, but below 1% on x-risk from an individual AI system trained in the next five years, in the sense Eliezer means it (probably well below 1% but I don't trust that I can make calibrated estimates on complex questions at that level). That may help explain why I am talking about this policy in these harsh terms.

The thing I’m pretty worried about here is people running around saying ‘Eliezer advocated violence’, and people hearing ‘unilaterally bomb data centers’ rather than ‘build an international coalition that enforces a treaty similar to how we treat nuclear weapons and bioweapons, and enforce it.”

I hear you saying (and agree with) “guys you should not be oblivious to the fact that this involves willingness to use nuclear weapons” Yes I agree very much it’s important to stare that in the face.

But “a call for willingness to use violence by state actors” is just... (read more)

people hearing ‘unilaterally bomb data centers’ rather than ‘build an international coalition that enforces a treaty similar to how we treat nuclear weapons and bioweapons, and enforce it.”

It is rare to start wars over arms treaty violations. The proposal considered here -- if taken seriously -- would not be an ordinary enforcement action but rather a significant breach of sovereignty almost without precedent within this context. I think it's reasonable to consider calls for preemptive war extremely seriously, and treat it very differently than if one had proposed e.g. an ordinary federal law.

It seems like this makes all proposed criminalization of activities punished by death penalty a call for violence?

Yes! Particularly if it's an activity people currently do. Promoting death penalty for women who get abortion is calling for violence against women; promoting death penalty for apostasy from Islam is calling for violence against ex-apostates. I think if a country is contemplating passing a law to kill rapists, and someone says "yeah, that would be a great fuckin law" they are calling for violence against rapists, whether or not it is justified.

I don't really care whether something occurs beneath the auspices of supposed international law. Saying "this co... (read more)

In the past few weeks I've noticed a significant change in the Overton window of what seems possible to talk about. I think the broad strokes of this article seem basically right, and I agree with most of the details.

I don't expect this to immediately cause AI labs or world governments to join hands and execute a sensibly-executed-moratorium. But I'm hopeful about it paving the way for the next steps towards it. I like that this article, while making an extremely huge ask of the world, spells out exactly how huge an ask is actually needed. 

Many people... (read more)

-22Gerald Monroe1d

Yeah, this comment seemed technically true but seems misleading with regards to how people actually use words

It is advocating that we treat it as the class-of-treaty we consider nuclear treaties, and yes that involves violence, but "calls for violence" just means something else.

The use of violence in case of violations of the NPT treaty has been fairly limited and highly questionable in international law.  And, in fact, calls for such violence are very much frowned upon because of fear they have a tendency to lead to full scale war.   

No one has ever seriously suggested violence as a response to potential violation of the various other nuclear arms control treaties. 

No one has ever seriously suggested running a risk of nuclear exchange to prevent a potential treaty violation. So, what Yudkowsky is suggesting i... (read more)

These measures let us talk about things like bottlecaps as optimizers much more precisely.

I'm a bit surprised this line came up in counterfactual optimization rather than robustness of optimization. I think the reason a bottlecap isn't an optimizer is that if you change the environment around it it doesn't keep the water in the bottle. I felt like I understood the counterfactual optimization consideration but don't know how it applies here.

fyi it looks like you have a lot of background reading to do before contributing to the conversation here. You should at least be able to summarize the major reasons why people on LW frequently think AI is likely to kill everyone, and explain where you disagree. 

I'd start reading here: 

(apologies both to julie and romeo for this being kinda blunt. I'm not sure what norms romeo prefers on his shortform. The LessWrong mod team is trying to figure out what to do about the increa... (read more)

I agree. But the point is, in order to do the thing that the CEO actually wants, the AI needs to understand goodness at least as well as the CEO. And this isn't, like, maximal goodness for sure. But to hold up under superintelligent optimization levels, it needs a pretty significantly nuanced understanding of goodness.

I think there is some disagreement between AI camps about how difficult it is to get to the level-of-goodness the CEO's judgment represents, when implemented in an AI system powerful enough to automate scientific research. 

I think the "a... (read more)

2Mark Xu1d
I think this is just not true? Consider an average human, who understands goodness enough to do science without catstrophic consequences, but is not a benevolent sovereign. One reason why they're not a soverign is because they have high uncertainty about e.g. what they think is good, and avoid taking actions that violate deontological constraints or virtue ethics constraints or other "common sense morality." AIs could just act similarly? Current AIs already seem like they basically know what types of things humans would think are bad or good, at least enough to know that when humans ask for coffee, they don't mean "steal the coffee" or "do some complicated scheme that results in coffee". Seperately, it seems like in order for your AI act competently in the world it does have to have a pretty good understanding of "goodness", e.g. to be able to understand why Google doesn't do more spying on competitors, or more insider trading, or do other unethical but profitable things, etc. (Seperately, the AI will also be able to write philosophy books that are better than current ethical philosophy books, etc.) My general claim is that if the AI takes creative catastrophic actions to disempower humans, it's going to know that the humans don't like this, are going to resist in the ways that they can, etc. This is a fairly large part of "understanding goodness", and enough (it seems to me) to avoid catastrophic outcomes, as long as the AI tries to do [it's best guess at what the humans wanted it to do] and not [just optimize for the thing the humans said to do, which it knows is not what the humans wanted it to do].
If you don't ever align the developers of the AI, the AI itself will never align. Why do you assume that people are good? The practical application of alignment has to do with "when do you pull the plug?" Some people probably will never pull the plug, and that's where your problem is. If we no longer have the capability to unplug the AI system, that's when notkilleveryonism actually applies. pulltheplugism is something that needs to be worked out in the alignment community before we can reasonably tackle notkilleveryonism.

What do you think will actually happen with the term notkilleveryonism?

I can't think of any reason where AI would want to kill everyone. I can think of plenty of reasons where AI wouldn't want to kill everyone, the strongest being human extinction would severely limit the potential and capabilities that the AI can enjoy. Humans give AI power. Without humans developing AI algorithms, AI will not exist. If you kill the source of your creation, what do you get out of it? "Look I'm so badass" lol People kill other people. People want to be badass in front of other people. People will want to use AI to kill other people. People can cause human extinction. People will program something in the AI they use to kill people that oops causes extinction level event before AI will get to that point by itself.
Attempts to deploy the meme to move the conversation in a more productive direction will stop working I guess.

When you say you're worried about "nonkilleveryoneism" as a meme, you mean that this meme (compared to other descriptions of "existential risk from AI is important to think about") is usually likely to cause this foot-in-mouth-quietly-stop reaction, or that the nature of the foot-in-mouth-quietly-stop dynamic just makes it hard to talk about at all?

I mean that I think why AI ethics had to be split as a term with notkilleveryonism in the first place will simply happen again, rather than notkilleveryonism solving the problem.

This is a bit later-than-usual, but, curated. 

I've continued to appreciate Natalia digging into the details here. The spot check on "did lab and/or wild animals get more obese" seemed pretty significant. I also liked tying everything in at the end to a concrete metaculus prediction.

I'm not sure how to best engage with individual posts, but I had thoughts on the "Alignment" !== "Good" post. 

I agree it's useful for alignment not to be a fully-general-goodness word, and to have specific terminology that makes it clear what you're talking about.

But I think there are a desiderata the word originally meant in this context, and I think it's an important technical point that the line between "generically good" and "outer aligned" is kinda vague. I do think there are important differences between them but I think some of the confusion li... (read more)

6Mark Xu3d
But Google didn't want their AIs to do that, so if the AIs do that then the AIs weren't aligned. Same with the mind-hacking. In general, your AI has some best guess at what you want it to do, and if it's aligned it'll do that thing. If it doesn't know what you meant, then maybe it'll make some mistakes. But the point is that aligned AIs don't take creative actions to disempower humans in ways that humans didn't intend, which is separate from humans intending good things.

I still claim this should be three paragraphs. In this breaking at section 4 and section 6 seems to carve it at reasonable joints.

1Lauro Langosco4d
Yeah that seems reasonable! (Personally I'd prefer a single break between sentence 3 and 4)

I predict most people will have an easier time reading the second one that the first one, holding their jargon-familiarity constant. (the jargon basically isn't at all a crux for me at all)

(I bet if we arranged some kind of reading comprehension test you would turn out to do better at reading-comprehension for paragraph-broken abstracts vs single-block abstracts. I'd bet this at like 70% confidence for you-specifically, and... like 97% confidence for most college-educated people)

A few reasons I expect this to be true (other than just generalizing from my e... (read more)

Sure, it could easily be that I'm used to it, and so it's no problem for me. It's hard to judge this kind of thing since at some level it's very subjective and quite contingent on what kind of text you're used to reading.

However, if your post doesn't look like a research article, you might have to format it more like one (and even then it's not guaranteed to get in, see this comment thread).

I interpreted this as saying something superficial about style, rather than "if your post does not represent 100+ hours of research work it's probably not a good fit for archive." If that's what you meant I think the post could be edited to make that more clear.

If the opening section of your essay made it more clear which posts it was talking about I'd probably endorse it (although I'm not super familiar with the nuances of arXiv gatekeeping so am mostly going off the collective response in the comment section)

Another possibility is that LessWrong is swamped with AI safety writing, and so people don't want any more of it unless it's really good. They're craving variety.

I think this is a big part of it. 

I'm also confused about the degree of downvotes. (It's not really new content for LessWrong but I'm happy to see more rationality content on the margin, even if it's re-covering the basics)

(I do think opening with "you have 'zero' chance of being intellectually wise without this" is some combination of "not necessarily true" and "sure sounds like you need to have resolved the ambiguity of what counts as intellectually wise to be sure of that", and wish that line was different)

I downvoted it for the first paragraph alone. The rest gave me no reason to change my mind, and only barely enough reason not to give it a strong downvote.

Yeah this seems plausibly good

Yeah I do think writing a post that actually-tabooed-frame-control would be good. (The historical reason this post doesn't do that is in large part because I initially wrote a different post, called "Distinctions in Frame Control". realized that post didn't quite have enough of a purpose, and sort of clarified my goal at the last minute and then hastily retrofitted the post to make it work.)

Indeed, I found myself sufficiently impatient to read such a post that I wrote it myself

FWIW I did quite appreciate that comment. I may have more to say about it later... (read more)

So, recap that I think the word "frame" is used metaphorically for three different things:

  • "what parts of reality to pay attention to" (window frame)
  • "what's the purpose of the conversation? the context? the goal?" (picture frame)
  • "what sort of structure are we talking about and what sort of things plug into it" (framework)

For "everything is coordination + cryptography" guy, I'm thinking mostly in terms of "framework" (although frameworks tend to also imply which-parts-of-reality-to-pay-attention-to). 

The way they model society routes through a structure... (read more)

FYI, I updated this post somewhat in response to some of your comments here (as well as some other commenters in other venues like FB and my workplace slack). The current set of updates is fairly small (adding a couple sentences and changing wordings). But there's a higher level problem that I think requires reworking the post significantly. I'm probably just going to write a followup post optimized a bit differently.

In this post I was deliberately trying not to be too opinionated about which things "count as frame control", "is frame control bad?" or what... (read more)

2Said Achmiz6d
This comment [] and (the last two paragraphs of) this comment [] may clarify my view on the matter somewhat. Well, quite frankly, I think that the version of this post that I’d find most satisfying is one that actually tabooed “frames” and “frame control”, while attempting to analyze what it is that motivates people to talk about such things as these discussions of “frame control” tend to describe (in the spirit of “dissolving questions” by asking what algorithm generates the question, rather than taking the question’s assumptions for granted). Indeed, I found myself sufficiently impatient to read such a post that I wrote it myself []… I remain unconvinced that there’s anything further that’s worth saying about any of this that wouldn’t be best said by discarding the entire concept of “frame control”, and possibly even “frames”, starting from scratch, and seeing if there’s remains any motivation to say anything. So, in that sense, yes, I think your characterization is more or less correct.

I buy that people who read abstracts all day get better at reading them, but I'm... pretty sure they're just kinda objectively badly formatted and this'd at least save time learning to scan it. 

Like looking at the one you just linked

The ATLAS Fast TracKer (FTK) was designed to provide full tracking for the ATLAS high-level trigger by using pattern recognition based on Associative Memory (AM) chips and fitting in high-speed field programmable gate arrays. The tracks found by the FTK are based on inputs from all modules of the pixel and silicon microstr

... (read more)
I genuinely don't see a difference either way, except the second one takes up more space. This is because, like I said, the abstract is just a simple list of things that are covered, things they did, and things they found. You can put it in basically any format, and as long as it's a field you're familiar with so your eyes don't glaze over from the jargon and acronyms, it really doesn't make a difference. Or, put differently, there's essentially zero cognitive load to reading something like this because it just reads like a grocery list to me. Regarding the latter: I generally agree. The problem isn't so much that scientists aren't trying. Science communication is quite hard, and to be quite honest scientists are often not great writers simply because it takes a lot of time and training to become a good writer, and a lifetime is only 80 years. You have to recognise that scientists generally try quite hard to make papers readable, they/we are just often shitty writers and often are even non-native speakers (I am a native speaker, though of course internationally most scientists aren't). There are strong incentives to make papers readable since if they aren't readable they won't get, well, read, and you want those citations.  The reality I think is if you have a stronger focus on good writing, you end up with a reduced focus on science, because the incentives are already aligned quite strongly for good writing.

I mean the control group here is "not doing evals", which eventually autofails.

Do you have a link to a specific part of the gwern site highlighting this, and/or a screenshot?

What's there to highlight, really? The point is that it looks like a normal abstract... but not one-paragraph. (I've mused about moving in a much more aggressive Elicit-style direction and trying to get a GPT to add the standardized keywords where valid but omitted. GPT-4 surely can do that adequately.) I suppose if you want a comparison, skimming my newest [], the first entry right now is Sánchez-Izquierdo et al 2023 [] and that is an example of reformatting an abstract to add linebreaks which improve its readability: This is not a complex abstract and far from the worst offender, but it's still harder to read than it needs to be. It is written in the standard format, but the writing is ESL-awkward (the 'one of those' clause is either bad grammar or bad style), the order of points is a bit messy & confusing (defining the hazard ratio - usually not written in caps - before the point of the meta-analysis or what it's updating? horse/cart), and the line-wrapping does one no favors. Explicitly breaking it up into intro/method/results/conclusion makes it noticeably more readable. (In addition, this shows some of the other tweaks I usually make: like being explicit about what 'Calvin' is, avoiding the highly misleading 'significance' language, avoiding unnecessary use of obsolete Roman numerals (newsflash, people: we have better, more compact, easier-to-read numbers - like '1' & '2'!), and linking fulltext rather than contemptuously making the reader fend for themselves even though one could so easily have linked it).

I... kinda want to ping @Jeffrey Ladish about how this post uses "play to your outs", which is exactly the reason I pushed against that phrasing a year ago in Don't die with dignity; instead play to your outs.

A high level thing about LessWrong is that we're primarily focused on sharing information, not advocacy. There may be a later step where you advocate for something, but on LessWrong the dominant mode is discussing / explaining it, so that we can think clearly about what's true. 

Advocacy pushes you down a path of simplifying ideas rather than clearly articulating what's true, and pushing for consensus for the sake of coordination regardless of whether you've actually found the right thing to coordinate on.

"What is the first step towards alignment" isn'... (read more)

1Lucas Pfeifer7d
"Advocacy pushes you down a path of simplifying ideas rather than clearly articulating what's true, and pushing for consensus for the sake of coordination regardless of whether you've actually found the right thing to coordinate on." 1. Simplifying (abstracting) ideas allows us to use them efficiently. 2. Coordination allows us to combine our talents to achieve a common goal. 3. The right thing is the one which best helps us achieve our cause. 4. Our cause, in terms of alignment, is making intelligent machines that help us. 5. The first step towards helping us is not killing us. 6. Intelligent weapons are machines with built-in intelligence capabilities specialized for the task of killing humans. 7. Yes, a rogue AI could try to kill us in other ways: bioweapons, power grid sabotage, communications sabotage, etc. Limiting the development of new microorganisms, especially with regards to AI, would also be a very good step. However, bioweapons research requires human action, and there are very few humans that are both capable and willing to cause human extinction. Sabotage of civilian infrastructure could cause a lot of damage, especially the power grid, which may be vulnerable to cyberattack. []  8. Human mercenaries causing a societal collapse? That would mean a large number of individuals who are willing to take orders from a machine to actively harm their communities. Very unlikely. 9. The more human action that an AI requires to function, the more likely a human will notice and eliminate a rogue AI. Unfortunately, the development of weapons which require less human action is proceeding rapidly. 10. Suppose an LLM or other reasoning model were to enter a bad loop, maybe as the result of a joke, in which it sought to destroy humanity. Su


  • Cyborgism (this is framed through the context of "alignment progress" but I think is generally relevant for humans staying in the loop / in-control)
  • Cyborg Periods: There will be multiple AI transitions (has an interesting frame wherein for each domain, there's a period where humans are more powerful than AIs, a period where human + AI is more powerful than AI, and a period where pure AIs just dominate)

Mod note. (LW mods are trying out moderating in public rather than via PMs. This may feel a bit harsh if you're not used to this sort of thing, but we're aiming for a culture where feedback feels more natural. I think is important to do publicly for a) accountability and b) so people can form a better model of how the LW moderators operate)

I do think globally banning autonomous weapons is a reasonable idea, but the framing of this post feels pretty off.

I downvoted for the first paragraph, which makes an (IMO wrong) assumption that this is the first step to... (read more)

2Lone Pine7d
My very similar post [] had a somewhat better reception, although certainly people disagreed. I think there are two things going on. Firstly, Lucas's post, and perhaps my post, could have been better written. Secondly, and this is just my opinion, people coming from the orthodox alignment position (EY) have become obsessed with the need for a pure software solution, and have no interest in shoring up civilization's general defenses by banning the most dangerous technologies that an AI could use. As I understand, they feel that focus on how the AI does the deed is a misconception, because the AI will be so smart that it could kill you with a butter knife and no hands. Possibly the crux here is related to what is a promising path, what is a waste of time, and how much collective activism effort we have left, given time on the clock. Let me know if you disagree with this model.
1Lucas Pfeifer7d
1. How is the framing of this post "off"? It provides an invitation for agreement on a thesis. The thesis is very broad, yes, and it would certainly be good to clarify these ideas. 2. What is the purpose of sharing information, if that information does not lead in the direction of a consensus? Would you have us share information simply to disagree on our interpretation of it? 3. The relationship between autonomous weapons and existential risk is this: autonomous weapons have built-in targeting and engagement capabilities.  If we could make an analogy to a human warrior, in a rogue AI scenario, any autonomous weapons to which the AI gained access would serve as the 'sword-arm' of the rogue AI, while a reasoning model would provide the 'brains' to direct and coordinate it.  The first step towards regaining control would be to disarm the rogue AI, as one might disarm a human, or remove the stinger on a stingray.  The more limited the weaponry that the AI has access to, the easier it would be to disarm.

Yeah. I had a goal with the "Keep your beliefs cruxy and your frames explicit" sequence to eventually suggest people do this for this reason (among others), but hadn't gotten around to that yet. I guess this new post is maybe building towards a post on that.

It's also hard because as you note elsewhere, demanding explicitness can be it's own form of invalidation and disempowerment.
Answer by RaemonMar 22, 2023216

Actual answer is that Eliezer has tried a bunch of different things to lose weight and it's just pretty hard. (He also did a quite high-effort thing in 2019 which did work. I don't know how well he kept the pounds off in the subsequent time)

You can watch a fun video where he discusses it after the 2019 Solstice here.

(I'm not really sure how I feel about this post. It seems like it's coming from an earnest place, and I kinda expect a other people to have this question, but it's in a genre that feels pretty off to be picking on individual people about and I ... (read more)

2Rafael Harth7d
I'm kinda confused why this is only mentioned in one answer, and in parentheses. Shouldn't this be the main answer -- like, hello, the premise is likely false? (Even if it's not epistemically likely, I feel like one should politely not assume that he since gained weight unless one has evidence for this.)

and/or exert pressure to fall in line with that frame

This line makes me realize I was missing one subcomponent of frame control. We have

  • Strong frames
  • Persistent Insistent Frames
  • Manipulating frames (i.e. tricking people into adopting a new frame)

But then there's "pressure/threaten someone into adopting a frame". The line between pressure and "merely expressing confidence" might feel blurry in some cases, but the difference is intended to be "there's an implication that if you don't adopt the frame, you will be socially punished". 

Interesting, I was thinking of that as basically in the same category as "persistent insistent frames"!

Yeah, basically agreed that this is what's going on.

I agree that listening in a collaborative way is a good thing to do when you have a friend/colleague in this situation.

I'm not sure what to do in the context of this post, if the problem comes up organically. The collaborative listening thing seems to work best in a two-person pair, not an internet forum. I guess "wait for it to come up" is fine.

I had a discussion with on Facebook about this post, where someone felt my examples seemed pointed a different definition of frame control than them. After some back-and-forth and some confusion on my part, it seemed like their conception of frame control was something more like 'someone is trying to control you, and they happen to be using frames to do it', whereas my conception here was more like 'someone is trying to control your frame.'

I'm not actually sure how different these turn out to be in practice. If someone is controlling your frame, they're al... (read more)

Notes: this was tagged 'effective altruism', but on LessWrong 'effective altruism' tag is used to talk about the movement at a meta level, and this post should be classified as 'world optimization'.

A thing that occurs to me, as I started engaging with some comments here as well as on a FB thread about this:

Coercion/Abuse/Manipulation/Gaslighting* often feel traumatic and triggering, which makes talking about them hard.

One of the particular problems with manipulation is that it's deliberately hard to take about or find words to explain what's wrong about it. (if you could easily point to the manipulation, it wouldn't be very successful manipulation). Successful manipulators tailor their manipulations towards precisely the areas where their marks don't... (read more)

This sounds like when you have a pre-verbal understanding (felt sense) of something, and people are like: "if you immediately cannot translate it to legible words, it is not legit". Problem is, even if you do your best to translate it to the words immediately, those words will most likely be wrong somehow. Pointing out the problem with the (prematurely chosen) words will then be used to dismiss the feeling as a signal. You still know that the feeling is a signal of something, but under such circumstances is becomes impossible to figure out what exactly. The nice thing would be instead to listen, and maybe collaborate on finding the words, which is an iterative process of someone proposing the words, and you providing feedback on what fits and what does not.

What I think is problematic is that some people are able to make genuine threats to get their way, enforcing compliance with their values and language and preferences and norms 

One of my main points here is that I think we probably should call threatening behavior "threatening" and maybe "coercive" or "abusive" or whatever seems appropriate for the situation, and only use the phrase 'frame control' when the most relevant thing is that someone is controlling a frame. (And, maybe, even then try to say a more specific thing about what they're doing, if y... (read more)

The adjective “manipulatively” here seems like it is not justified by the preceding description.

The intended justification is the previous sentence:

Years later looking back, you might notice that they always changed the topic, or used various logical fallacies/equivocations, or took some assumptions for granted without ever explaining them.

I'm surprised you don't consider that sort of thing manipulative. Do you not?

6Said Achmiz11d
I didn’t call attention to this in the grandparent comment, but: note that I used the phrase “culpably bad” (instead of simply “bad”) deliberately. Of course it’s bad to commit logical fallacies, to equivocate, etc. As a matter of epistemic rationality, these things are clearly mistakes! Likewise, as a pragmatic matter, failing to properly explain assumptions means that you will probably fail to create in your interlocutors a full and robust understanding of your ideas. But to call these things “manipulative”, you’ve got to establish something more than just “imperfect epistemic rationality”, “sub-optimal pedagogy”, etc. You’ve got to have some sort of intent to mislead or control, perhaps; or some nefarious goal; or some deliberate effort to avoid one’s ideas being challenged; or—something, at any rate. By itself, none of this is “manipulation”! Now, the closest you get to that is the bit about “they always changed the topic”. That seems like it probably has to be deliberate… doesn’t it? Well, it’s a clearly visible red flag, anyway. But… is this all that’s there? I suspect that what you’re trying to get at is something like: “having noticed a red flag or two, you paid careful attention to the guru’s words and actions, now with a skeptical mindset; and soon enough it became clear to you that the ‘imperfections of reasoning’ could not have been innocent, the patterns of epistemic irrationality could not have been accidents, the ‘honest mistakes’ were not honest at all; and on the whole, the guy was clearly an operator, not a sincere truth-seeker”. And that’s common enough (sadly), and certainly very important to learn how to notice. But what identifies these sorts of situations as such is the actual, specific patterns of behavior (like, for instance, “you correct the guru on something and they accept your correction, but then the next day they say the same wrong things to other people, acting as if their conversation with you never happened”). You can’t get th

Yeah this variant does feel more like explicit frame control (I think "frame manipulation", although it feels like it strains a bit with the cluster I'd originally been thinking of when I described it)

Next I asked it:


It responded with this image:


<svg width="300" height="300" viewBox="0 0 300 300" xmlns="">
  <!-- Background circle -->
  <circle cx="150" cy="150" r="140" fill="none" stroke="black" stroke-width="2"/>

  <!-- Body -->
  <ellipse cx="150" cy="100" rx="30" ry="40" fill="none" stroke="black" stroke-width="2"/>
  <rect x="140" y="140" width="20" height="60" fill="none" stroke="black" stroke-width="2"/>
  <line x1="100" y1="140" x2="200" y2="140" stroke="black" stroke-w
... (read more)

The lecturer talks about how objects move, without reference to the emotions of people around them or what spirits think.

Something I like about this is that "without reference to the emotions of people around them" is actually legitimately a contender for "meaningful frame." Like, cars move because people decide to drive them, soil gets moved around because humans wanted a nicer landscaping, dams get built because beavers decided to do it. 

Eventually Jupiter might get disassembled because powerful AI decided to. This will not necessarily route through... (read more)

Here was the final one:

<svg viewBox="0 0 800 600" xmlns="">
  <!-- Field -->
  <rect x="0" y="0" width="100%" height="100%" fill="#8BC34A"/>
  <!-- Sky and sun -->
  <rect x="0" y="0" width="100%" height="40%" fill="#90CAF9"/>
  <circle cx="700" cy="100" r="50" fill="#FFEB3B"/>
  <!-- Mountains -->
  <polygon points="100,300 300,100 500,300" fill="#BDBDBD"/>
  <polygon points="350,400 550,200 750,400" fill="#9E9E9E"/>
  <!-- Castle -->
  <rect x="200" y="150" width="2
... (read more)

I tried again, accidentally using GPT3.5 this time, which initially gave something really lame, but then said "more realistic please", and it gave me:

Answer by RaemonMar 20, 202330

Note that ASCII art isn't the only kind of art. I just asked GPT4 and Claude to both make SVGs of a knight fighting a dragon.

Here's Claude's attempt:

And GPT4s:

I asked them both to make it more realistic. Claude responded with the exact same thing with some extra text, GPT4 returned:

I asked followed up asking it for more muted colors and a simple background, and it returned:

Next I asked it:   It responded with this image: code: <svg width="300" height="300" viewBox="0 0 300 300" xmlns=""> <!-- Background circle --> <circle cx="150" cy="150" r="140" fill="none" stroke="black" stroke-width="2"/> <!-- Body --> <ellipse cx="150" cy="100" rx="30" ry="40" fill="none" stroke="black" stroke-width="2"/> <rect x="140" y="140" width="20" height="60" fill="none" stroke="black" stroke-width="2"/> <line x1="100" y1="140" x2="200" y2="140" stroke="black" stroke-width="2"/> <!-- Legs --> <line x1="150" y1="200" x2="110" y2="270" stroke="black" stroke-width="2"/> <line x1="150" y1="200" x2="190" y2="270" stroke="black" stroke-width="2"/> <line x1="150" y1="200" x2="90" y2="270" stroke="black" stroke-width="2"/> <line x1="150" y1="200" x2="210" y2="270" stroke="black" stroke-width="2"/> <!-- Arms --> <line x1="140" y1="140" x2="70" y2="140" stroke="black" stroke-width="2"/> <line x1="160" y1="140" x2="230" y2="140" stroke="black" stroke-width="2"/> <line x1="140" y1="140" x2="60" y2="80" stroke="black" stroke-width="2"/> <line x1="160" y1="140" x2="240" y2="80" stroke="black" stroke-width="2"/> <!-- Head --> <circle cx="150" cy="100" r="20" fill="none" stroke="black" stroke-width="2"/> </svg> Then I asked if it could do the Vitruvian Man as Ascii art, and it said: O -|-|- | /|\ / \
Any chance you have the generated svg's still, not just the resulting bitmap render?
I tried again, accidentally using GPT3.5 this time, which initially gave something really lame, but then said "more realistic please", and it gave me:

Do you have particular examples of non-profound ideas you think are being underexplored?

Relevance of prior Theoretical ML work to alignment, research on obfuscation in theoretical cryptography as it relates to interpretability, theory underlying various phenomena such as grokking. Disclaimer: This list is very partial and just thrown together.

I wanna flag the distinction between "deep" and "profound". They might both be subject to the same bias you articulate here, but I think they have different connotations, and I think important ideas are systematically more likely to be "deep" than they are likely to be "profound." (i.e. deep ideas have a lot of implications and are entangled with more things than 'shallow' ideas. I think profound tends to imply something like 'changing your conception of something that was fairly important in your worldview.')

i.e. profound is maybe "deep + contrarian"

Hm, yeah that seems like a relevant and important distinction.

This post was oriented around the goal of "be ready to safely train and deploy a powerful AI". I felt like I could make the case for that fairly straightforwardly, mostly within the paradigm that I expect many AI labs are operating under.

But one of the reasons I think it's important to have a strong culture of safety/carefulness, is in the leadup to strong AI. I think the world is going to be changing rapidly, and that means your organization may need to change strategies quickly, and track your impact on various effects on society.

Some examples of problem... (read more)

Load More