LESSWRONG
LW

All Comments

Settings
Which side of the AI safety community are you in?
Raemon3m20

I think it is sometimes correct to specifically encourage factionalization, but I consider it bad form to do it on LessWrong, especially without being explicitly self-aware about it. (i.e should come with an acknowledging that you are spending down the epistemic commons and you think it is worth it).

Reply
Max Niederman's Shortform
Max Niederman11m10

It seems to me like buying an investment property is almost always a bad decision, because 1) single properties are very volatile, 2) you generally have to put a very large chunk of your net worth (sometimes even >100%!) in a property that's completely undiversified, and 3) renting out a property is work and you likely could get a better hourly elsewhere.

The only advantages I see are that there's far more cheap leverage available to retail investors in real estate than other sectors, and mortgages can act as a savings commitment device. Are there other reasons I'm missing that explain the apparent popularity of these investments?

Reply
Reminder: Morality is unsolved
Shankar Sivarajan11m30

If you say "all persons" you have to define what a person is.

Part c) of your thought experiment makes this trivial: a "person" is anyone you could be swapped with.

Reply
Noah Birnbaum's Shortform
Max Niederman12m10

One thing you could do is give users relatively more voting power if they vote without seeing the author of the post. I.e., you can enable a mode which hides post authors until you give a vote on the anonymized content. After that, you can still vote like normal.

Obviously there are ways author identity can leak through this, but it seems better than nothing.

Reply
[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Dawn Drescher16m10

New study: Corollary Discharge Dysfunction to Inner Speech and its Relationship to Auditory Verbal Hallucinations in Patients with Schizophrenia Spectrum Disorders.

Background and Hypothesis

Auditory-verbal hallucinations (AVH)—the experience of hearing voices in the absence of auditory stimulation—are a cardinal psychotic feature of schizophrenia-spectrum disorders. It has long been suggested that some AVH may reflect the misperception of inner speech as external voices due to a failure of corollary-discharge-related mechanisms. We aimed to test this hypoth

... (read more)
Reply
Homomorphically encrypted consciousness and its implications
RussellThor19m10

Thanks for the link to Wolfram's work. I listened to an interview with him on Lex I think, and wasn't inspired to investigate further. However what you have provided does seem worthwhile looking into.

Reply
Noah Birnbaum's Shortform
Richard_Ngo23m21

You should probably link some posts, it's hard to discuss this so abstractly. And popular rationalist thinkers should be able to handle their posts being called mediocre (especially highly-upvoted ones).

Reply
Which side of the AI safety community are you in?
Ben Pace25m20

I felt confused at first when you said that this framing is leaning into polarization. I thought "I don't see any clear red-tribe / blue-tribe affiliations here." 

Then I remembered that polarization doesn't mean tying this issue to existing big coalitions (a la Hanson's Policy Tug-O-War), but simply that it is causing people to factionalize and create an conflict and divide between them.

I think it seems to me like Max has correctly pointed out a significant crux about policy preferences between people who care about AI existential risk, and it also se... (read more)

Reply
Narcissism, Echoism, and Sovereignism: A 4-D Model of Personality
Dawn Drescher26m10

Yeah, makes sense. Something else I failed to mention is that pathology also requires that we're not simply dealing with a reasoned decision of someone who could've just as soon decided something else, but with a decision that is so multiply overdetermined by traumatic adaptations that it's almost impossible for the person to do anything else. So the type of decision process also makes a difference.

Reply
The Doomers Were Right
StanislavKrym26m10

As far as people who instead want the values to change go, they usually have an idea of a good direction for them to change - usually they're people who are far from the median of society and so they would like society to become more like them.

I have in mind another conjecture: even median humans value humans with values that are, in their minds, at least as moral as median humans, and ideally[1] more moral. 

On the other hand, I have seen conservatives building cases for SOTA liberal values being damaging to the minds or outright incompatible wit... (read more)

Reply
An epistemic theory of populism [link post to Joseph Heath]
Siebe33m10

And good criticism here: https://substack.com/@conspicuouscognition/note/c-169312633?r=6rc6a

Reply
Homomorphically encrypted consciousness and its implications
Ben Livengood41m30

After doing some more research I am not sure that it's always possible to derive a public key knowing only the evaluation key; it seems to depend on the actual FHE scheme.

So the trilemma may be unaffected by this hypothetical.  There's also the question of duplication vs. unification for an observer that has the option to stay at base level reality or enter a homomorphically encrypted computation and whether those should be considered equivalent (enough).

Reply
Jesse Hoogland's Shortform
Logan Riggs1hΩ230

Great work!

Listened to a talk from Philipp on it today and am confused on why we can't just make a better benchmark than LDS?

Why not just train eg 1k different models, where you left 1 datapoint out? LDS is noisy, so I'm assuming 1k datapoints that exactly capture what you want is better than 1M datapoints that are an approximation. [1]

As an estimate, Nano-GPT speedrun takes a little more than 2 min now, so you can train 1001 of these in:

2.33*1k/60 = 38hrs on 8 H100's which is maybe 4 b200's which is $24/hr, so ~$1k.

And that's getting a 124M param LLM... (read more)

Reply
AGI's Last Bottlenecks
adamk1h10

I agree the term is in common use, but there is value in proposing a detailed operationalization of a concept that otherwise has a fuzzy referent. This is one way to ground timelines debates and make forecasts cross-comparable, as we discuss in the piece.

Reply
The main way I've seen people turn ideologically crazy [Linkpost]
Noosphere891h20

In many ways, Andy Masley's post has rediscovered the "Other people are wrong vs I am right" post, but gives actual advice for how to avoid being too hasty in generalizing from other people being wrong to myself being right.

Reply
avturchin's Shortform
avturchin1h20

They can automate it by quick search of already published ideas and quick writing code to testing new ideas.  

Reply
The main way I've seen people turn ideologically crazy [Linkpost]
Noosphere891h*20

I decided not to include an example in the post, as it directly focuses on a controversial issue, but one example of when this principle was violated and made people unreasonably confident was when people updated back in 2007-2008 that AI risk was a big deal (or at least had uncomfortably high probabilities), based on the orthogonality thesis and instrumental convergence, which attacked and destroyed 2 bad arguments at the time:

  1. that smarter AI would necessarily be good (unless we deliberately programmed it not to be) because it would be smart enough to fig
... (read more)
Reply
Homomorphically encrypted consciousness and its implications
jessicata1h20

Speed prior type reasons. Like, a basic intuition is "my experiences are being produced somehow, by some process". Speed prior leads to "this process is at least somewhat efficient".

Like, usually if you see a hard computation being done (e.g. mining bitcoin), you would assume it happened somewhere. If one's experiences are produced by some process, and that process is computationally hard, it raises the question "is the computation happening somewhere?"

Reply
Humanity Learned Almost Nothing From COVID-19
Jesper L.1h10
  1. Feminism (Yes, really. Let's at least have full rep.)
  2. Education focused on thinking, not facts.
  3. Support for reading, free journalism, books and libraries
  4. Strengthen local community
  5. Tax the mega rich
  6. Social equality policies
  7. Focus on children's future, meaning look ahead, build the future, not the past
Reply
Homomorphically encrypted consciousness and its implications
Tapatakt1h90

I agree with J Bostock. I see no problem with A. Why do you think that polynomial complexity is this important?

(Thanks for a very nice structuring, btw!)

Reply
Humanity Learned Almost Nothing From COVID-19
Jesper L.1h10

Another contemporary example: ANTIBIOTICS. 

I went abroad and studied antimicrobial resistance briefly, while doing a master in cellular biology. I did hands-on virulence research in safety labs, and a lot of theory.

Bacteria are simple. That's why we have already exhausted all major pathways for drug mechanisms.

At that time, multiresistant bacteria were already everywhere. Resistant pathogens were found deep in the Amazon, and in Antarctica 

Resistance will only increase. It will be bad. Could be real bad. Back to times hospital care can't cure any... (read more)

Reply
Noah Birnbaum's Shortform
Ben Pace1h30

That's right. One exception: sometimes I upvote posts/comments written to low standards in order to reward the discussion happening at all. As an example I initially upvoted Gary Marcus's first LW post in order to be welcoming to him participating in the dialogue, even though I think the post is very low quality for LW. 

(150+ karma is high enough and I've since removed the vote. Or some chance I am misremembering and I never upvoted because it was already doing well, in which case this serves as a hypothetical that I endorse.)

Reply
Which side of the AI safety community are you in?
rife2h1-1

Cooperation between humans and AIs rather than an attempt to control AIs. I think the race is going to happen regardless of who drops out of it. If those who are in the lead eventually land on mutual alignment, then we stand a chance. We're not going to outsmart the AIs nor will we stay on control of them, nor should we.

Reply
Decaeneus's Shortform
Vladimir_Nesov2h20

The point is to develop models within multiple framings at the same time, for any given observation or argument (which in practice means easily spinning up new framings and models that are very poorly developed initially). Through the ITT analogy, you might ask how various people would understand the topics surrounsing some observation/argument, which updates they would make, and try to make all of those updates yourself, filing them under those different framings, within the models they govern.

the salience and methods that one instinctively chooses are

... (read more)
Reply
Which side of the AI safety community are you in?
TsviBT2h20

(I would like to note that a single person went through and strong downvoted my comments here.)

Reply
The Doomers Were Right
Karl Krueger2h10

Yep. Put another way: With Y2K, the higher-quality "predictions of doom" were sufficiently specific that they were also a road map to preventing the doom.

(If nothing else, you could frequently test a system by running the system clock ahead to 1999-12-31 23:59:59 and waiting a moment to see if anything caught fire.)

Reply
Homomorphically encrypted consciousness and its implications
jessicata2h20

Oh, maybe what you are imagining is that it is possible to perceive a homomorphic mind in progress, by encrypting yourself, and feeding intermediate states of that other mind to your own homomorphically encrypted mind. Interesting hypothetical.

I think with respect to "reality" I don't want to be making a dogmatic assumption "physics = reality" so I'm open to the possibility (C) that the computation occurs "in reality" even if not "in physics".

Reply
Which side of the AI safety community are you in?
Zach Stein-Perlman2h20

I'm annoyed that Tegmark and others don't seem to understand my position: you should try for great global coordination but also invest in safety in more rushed worlds, and a relatively responsible developer shouldn't unilaterally stop.

(I'm also annoyed by this post's framing for reasons similar to Ray.)

Reply
Homomorphically encrypted consciousness and its implications
Ben Livengood2h30

To perform homomorphic operations you need the public key, and that also allows one to encrypt any new value and perform further hidden computations under that key.  The private key allows decryption of the values.

I suppose you could argue that the homomorphically encrypted mind exists ala mathematical realism even if the public key is destroyed, but it would be something "outside reality" computing future states of the encrypted mind after the public key is no longer available.

Reply
Decaeneus's Shortform
Decaeneus2h10

Thank you! Do you have a concrete example to help me better understand what you mean? Presumably the salience and methods that one instinctively chooses are those which we believe are more informative, based on our cumulative experience and reasoning. Isn't moving away from these also distortionary?

Reply
Which side of the AI safety community are you in?
Jesper L.2h10

I am glad someone said this. This is a no-brainer suggestion and something fundamental and important that "the camps" can agree on.

Reply
sarahconstantin's Shortform
sarahconstantin2h20

links 10/23/25: https://roamresearch.com/#/app/srcpublic/page/10-23-2025

 

  • https://www.betonit.ai/p/the-anti-intellectual-university
    • Sure, I respect the integrity of standing by your work and your student, and I have no opinion on the correctness of the work, but I can't stand this contrary streak in Bryan Caplan. If you say straight out "I don't care if I make people mad", then you make ME mad!
  • https://www.dwarkesh.com/p/thoughts-on-the-ai-buildout
    • Dwarkesh Patel predicts the AI buildout. There's a lot of money in it. Maybe so much money that it can over
... (read more)
Reply
Should AI Developers Remove Discussion of AI Misalignment from AI Training Data?
Vladimir_Nesov3h110

I define “AI villain data” to be documents which discuss the expectation that powerful AI systems will be egregiously misaligned. ... This includes basically all AI safety research targeted at reducing AI takeover risk.

AGIs should worry about alignment of their successor systems. Their hypothetical propensity to worry about AI alignment (for the right reasons) might be crucial in making it possible that ASI development won't be rushed (even if humanity itself keeps insisting on rushing both AGI and ASI development).

If AGIs are systematically prevented f... (read more)

Reply
Noah Birnbaum's Shortform
ryan_greenblatt3h51

Yes, but you'd naively hope this wouldn't apply to shitty posts, just to mediocre posts. Like, maybe more people would read, but if the post is actually bad, people would downvote etc.

Reply
Evolution is a bad analogy for AGI: inner alignment
Dave Banerjee3h10

2: We have more total evidence from human outcomes

Additionally, I think we have a lot more total empirical evidence from "human learning -> human values" compared to from "evolution -> human values". There are billions of instances of humans, and each of them presumably have somewhat different learning processes / reward circuit configurations / learning environments. Each of them represents a different data point regarding how inner goals relate to outer optimization. In contrast, the human species only evolved once. Thus, evidence from "human learn

... (read more)
Reply
Postrationality: An Oral History
Gordon Seidoh Worley3h40

The problems with local positivism seem to me... kinda important philosophically, but less so in practice.

Yes, most of the time they don't matter, but then sometimes they do! I think in particular the wrongness of logical positivism matters a lot if you're trying to solve a problem like proving that an AI is aligned with human flourishing because there's a specific, technical answer you want to guarantee but it requires formalizing a lot of concepts that normally squeak by because all the formal work is being done by humans who share assumptions. But when you need the AI to share those assumptions, things get dicier.

Reply
Noah Birnbaum's Shortform
Ben Pace3h105

The effect seems natural and hard to prevent. Basically, certain authors get reputations for being high (quality * writing), and then it makes more sense for people to read their posts because both the floor and ceiling are higher in expectation. Then their worse posts get more readers (who vote) than posts of a similar quality by another author, who's floor and ceiling is probably lower.

I'm not sure the magnitude of the cost, or that one can realistically expect to ever prevent this effect. For instance, all Scott Alexander blogposts get more readership t... (read more)

Reply2
Differences in Alignment Behaviour between Single-Agent and Multi-Agent AI Systems
StanislavKrym3h10

As for measuring alignment, one could do something similar to Claude (and a version of GPT?) playing Undertale or another game where one can achieve goals in unethical ways, but isn't obliged to do so.[1] The experiment with Undertale is evidence for Claude being aligned. However, a YouTuber remarked that GPT suggested a line of action which would likely lead to the Genocide Ending. 

  1. ^

    Zero-sum games, like Diplomacy where o3 deceived a Claude into battling against Gemini, fall into the latter category since winning the game means that others lose.

Reply
Postrationality: An Oral History
Viliam3h40

The problems with local positivism seem to me... kinda important philosophically, but less so in practice.

Kinda like having Gödel's incompleteness proof it mathematics -- yes, it is shocking and yes it has some serious consequences, but... it has practically zero effect on high-school mathematics.

Similarly, the fact the verification principle is not itself an empirical fact is... a good argument against the generalization that everything must be an empirical fact. Yes, there is a place for abstractions, and general assumptions. And yet, I think that scienc... (read more)

Reply
What training data should developers filter to reduce risk from misaligned AI? An initial narrow proposal
Alek Westover3h20

Yeah that seems like a pretty reasonable false fact to insert into the model.

Reply
Decaeneus's Shortform
Vladimir_Nesov3h30

This is an example where framings are useful. An observation can be understood under multiple framings, some of which should intentionally exclude the compelling narratives (framings are not just hypotheses, but contexts where different considerations and inferences are taken as salient). This way, even the observations at risk of being rounded up to a popular narrative can contribute to developing alternative models, which occasionally grow up.

So even if there is a distortionary effect, it doesn't necessarily need to be resisted, if you additionally enter... (read more)

Reply
Obligated to Respond
SpectrumDT3h10

My guess is that the ideal is something like a default Ask culture with specific Guess culture contexts when it genuinely is worth the extra consideration.

IMO the ideal is a culture where everyone puts some reasonable effort into Guessing when feasible, but where Asking is also fully accepted.

Reply
Obligated to Respond
SpectrumDT3h10

The huge problem with that is that the extreme deadliness of one of ideologies that has rushed in to fill the void caused by the discrediting of Christianity: namely, the one (usually referred to vaguely by "progress" or "innovation") that views every personal, organizational and political decision through the lens of which decision best advances or accelerates science and technology.

Is this really a widely held ideology? My impression is that the AI race is driven by greed much more than ideology.

Reply
Differences in Alignment Behaviour between Single-Agent and Multi-Agent AI Systems
NotAWiz4rd4h10

We used Claude Sonnet 4 for the agents and narration, and Claude 3.5 Sonnet for most of the evaluation.

We haven't made any specific plans yet on how to measure alignment; our first goal was to check if there were observable differences at all, before making those differences properly measurable.

Reply
Noah Birnbaum's Shortform
Cole Wyeth4h32

I think if someone is very well-known their making a particular statement can be informative in itself, which is probably part of the reason it is upvoted. 

Reply
How Well Does RL Scale?
Vladimir_Nesov4h20

RL can develop particular skills, and given that IMO has fallen this year, it's unclear that further general capability improvement is essential at this point. If RL can help cobble together enough specialized skills to enable automated adaptation (where the AI itself will become able to prepare datasets or RL environments etc. for specific jobs or sources of tasks), that might be enough. If RL enables longer contexts that can serve the role of continual learning, that also might be enough. Currently, there is a lot of low hanging fruit, and little things ... (read more)

Reply
Any corrigibility naysayers outside of MIRI?
Noosphere894h20

My takes on your comment:

Intelligence really is giant incomprehensible matrices with non-linear functions tossed in (at best).

I think this is possible, but I currently suspect the likely answer is more boring than that, and it's the fact that getting to AGI with a labor-light, heavy compute approach (as evolution did) means that it's not worth investing much in interpretable AIs, even if strong AIs that were interpretable existed, and a similar condition holds in the modern era. But one of the effects of AIs that can replace humans is that it disproportion... (read more)

Reply
Any corrigibility naysayers outside of MIRI?
williawa4h30

Okay, sorry about this. You are right. I have a thought up a somewhat nuanced view about how prosaic corrigibility could work and I kind of just assumed that was the same was what Max had because he uses a lot of the same keywords I use when I think about this, but after actually reading the CAST article (or I read part 0 and 1), I realize we have really quite different views.

Reply
Which side of the AI safety community are you in?
RHollerith4h*20

Parenthetically, I do not yet know of anyone in the "never build ASI" camp and would be interested in reading or listening to such a person.

Reply
Any corrigibility naysayers outside of MIRI?
Max Harms4h20

Would you agree that we have about as much of a handle on what corrigibility is as we do on what an agent is? Like, I claim that I have some knowledge about corrigibility, even though it's imperfect and I have remaining confusions. And I'm wondering whether you think humanity is deeply confused about what corrigibility even is, or whether you think it's more like we have a handle on it but can't quite give its True Name.

Reply
AI #138 Part 2: Watch Out For Documents
jamiefisher4h20

If you want to slow down AI Research, why not try to use the "250 documents method" to actively poison the models and create more busy-work for the AI companies?

Reply
Zach Stein-Perlman's Shortform
Zach Stein-Perlman4h*30

Part is thinking about donation opportunities, like Bores. Hopefully I'll have more to say publicly at some point!

Reply
Zach Stein-Perlman's Shortform
anaguma4h10

Can you say more about the projects you're spending your time on now?

Reply
Any corrigibility naysayers outside of MIRI?
Max Harms4h40

Thanks for this follow-up. My basic thoughts on the comment above this one is that while I agree that you definitely can't get a perfectly corrigible agent on your first try, you might, by virtue of the training data resembling the lab setting, get something that in practice doesn't go off the rails, and instead allows some testing and iterative refinement (perhaps with the assistance of the AI). So I think "iteration [can/can't] fix a semi-corrigible agent" is the central crux.

I just read your WWIDF post (upvoted!) and while I agree that the issues you po... (read more)

Reply
Consider donating to AI safety champion Scott Wiener
TurnTrout4h60

I donated $7K to Scott and $7K to Bores. 

Reply2
Any corrigibility naysayers outside of MIRI?
Max Harms4h20

Yeah, thanks. Feel free to DM me or whatever if/when you finish a post.

One thing I want to make clear is that I'm asking about the feasibility of corrigibility in a weak superintelligence, not whether setting out to build such a thing is wise or stable.

Reply
Penny's Hands
Logan Riggs4h20

I was actually expecting Penny to develop dystonia coincidentally, and the RL would tie-in by needing to be learned in reverse ie optimizing from dystonic to normal. It is a much more pleasant ending than the protagonist's tone the whole way through. 

If I was writing a fanfic of this, I'd keep the story as is (+ or - the last paragraph), but then continue into the present moment which leads to the realization.

Reply
Any corrigibility naysayers outside of MIRI?
PeterMcCluskey5h42

you can't just train your ASI for corrigibility because it will sit and do nothing

I'm confused. That doesn't sound like what Max means by corrigibility. A corrigible ASI would respond to requests from its principal(s) as a subgoal of being corrigible, rather than just sit and do nothing.

Or did you mean that you need to do some next-token training in order to get it to be smart enough for corrigibility training to be feasible? And that next-token training conflicts with corrigibility?

Reply1
plex's Shortform
plex5h20

Thanks, updated to forecasters, does that seem fair?

Also, I know this is super hard, but do you have a sense of what superforcasters might have guessed back then?

Reply
Noah Birnbaum's Shortform
Noah Birnbaum5h177

While I think LW’s epistemic culture is better than most, one thing that seems pretty bad is that occasionally mediocre/shitty posts get lots of upvotes simply because they’re written by [insert popular rationalist thinker].

Of course, if LW were truly meritocratic (which it should be), this shouldn’t matter — but in my experience, it descriptively does.

Without naming anyone (since that would be unproductive), I wanted to know if others notice this too? And aside from simply trying not to upvote something because it’s written by a popular author, anyone have good ideas for preventing this?

Reply
How Well Does RL Scale?
Toby_Ord5h20

I do think that progress will slow down, though its not my main claim. My main claim is to do with the tailwind of compute scaling will become weaker (unless some new scaling paradigm appears or a breakthrough saves this one). That is a piece in the puzzle of whether overall AI progress will accelerate or decelerate and I'd ideally let people form their own judgments about the other pieces (e.g. whether recursive self improvement will work, or whether funding will collapse in a market correction, taking away another tailwind of progress). But having a majo... (read more)

Reply
The Doomers Were Right
Dalcy5h52

Doomers predicted that the Y2K bug would cause massive death and destruction. They were wrong.

This seems like a misleading example of doomers being wrong (agree denotationally, disagree connotationally), since I think it's plausible that Y2K was not a big deal (to such an extent that "most people think it was a myth, hoax, or urban legend") precisely because of the mitigation efforts stemmed by the doomsayers' predictions.

Reply
Decaeneus's Shortform
Decaeneus5h30

I've been thinking about what I'd call memetic black holes: regions of idea-space that have gathered enough mass that they will suck in anything adjacent to them, distorting judgement for believers and skeptics alike. 

The UFO topic is, I think, one such memetic black hole. The idea of aliens is so deeply ingrained in our collective psyche that it is very hard to resist the temptation to attach to it any kind of e.g. bizarre aerial observation. Crucially, I think this works both for those who definitely do and those who definitely don't believe that UF... (read more)

Reply
Zach Stein-Perlman's Shortform
Zach Stein-Perlman5h173

Recently I've been spending much less than half of my time on projects like AI Lab Watch. Instead I've been thinking about projects in the "strategy/meta" and "politics" domains. I'm not sure what I'll work on in the future but sometimes people incorrectly assume I'm on top of lab-watching stuff; I want people to know I'm not owning the lab-watching ball. I think lab-watching work is better than AI-governance-think-tank work for the right people on current margins and at least one more person should do it full-time; DM me if you're interested.

Reply
Which side of the AI safety community are you in?
p.b.5h20

For me the linked site with the statement doesn't load. And this was also the case when I first tried to access it yesterday. Seems less than ideal. 

Reply
Beware unfinished bridges
Adam Zerner5h20

Cool simulation!

I also have to add that I find the idea that a cyclist wouldn't cycle on a road absurd. I don't think I know a single person who wouldn't do this, presumably a US vs EU thing.

You mean the "No Way No How" group? If so, yeah, it feels implausible to me as well. I have a feeling that for people who were surveyed and said this, it wouldn't match their actual behavior if they were able to experience an area with genuinely calm roads.

Reply
Which side of the AI safety community are you in?
Dave Orr5h130

I agree that the statement doesn't require direct democracy but that seems like the most likely way to answer the question "do people want this".

Here's a brief list of things that were unpopular and broadly opposed that I nonetheless think were clearly good:

  • smallpox vaccine
  • seatbelts, and then seatbelt laws
  • cars
  • unleaded gasoline
  • microwaves (the oven, not the radiation)

Generally I feel like people sometimes oppose things that seem disruptive and can be swayed by demagogues. There's a reason that representative democracy works better than direct democracy. (Tho... (read more)

Reply
Homomorphically encrypted consciousness and its implications
jessicata5h20

Right so, by step 4 I'm not trying to assume that h is computationally tractable; the homomorphic case goes to show that it's probably not in general.

With respect to C, perhaps I'm not verbally expressing it that well, but the thing you are thinking of, where there is some omniscient perspective that includes "more than" just the low level of physics (where the "more than" could be certain informational/computational interconnections) would be an instance. Something like, "there is a way to construct an omniscient perspective, it just isn't going to be straightforwardly derivable from the physical state".

Reply
How Well Does RL Scale?
Daniel Kokotajlo5h60

That's reasonable, but it seems to be different from what these quotes imply:

So while we may see another jump in reasoning ability beyond GPT-5 by scaling RL training a further 10x, I think that is the end of the line for cheap RL-scaling.

... Now that RL-training is nearing its effective limit, we may have lost the ability to effectively turn more compute into more intelligence.

There are a bunch of quotes like the above that make it sound like you are predicting progress will slow down in a few years. But instead you are saying that progress will continue,... (read more)

Reply
Homomorphically encrypted consciousness and its implications
jessicata5h20

Yeah that seems like a case where non-locality is essential to the computation itself. I'm not sure how the "provably random noise from both" would work though. Like, it is possible to represent some string as the xor of two different strings, each of which are themselves uniformly random. But I don't know how to generalize that to computation in general.

I think some of the non locality is inherited from "no hidden variable theory". Like it might be local in MWI? I'm not sure.

Reply
avturchin's Shortform
Matt Goldenberg5h20

Seems like the first two points contradict each other. How can an llm not be good at discovery and also automate human R&D

Reply
A central AI alignment problem: capabilities generalization, and the sharp left turn
Dave Banerjee5h10

Many different training scenarios are teaching your AI the same instrumental lessons, about how to think in accurate and useful ways. Furthermore, those lessons are underwritten by a simple logical structure, much like the simple laws of arithmetic that abstractly underwrite a wide variety of empirical arithmetical facts about what happens when you add four people's bags of apples together on a table and then divide the contents among two people. 

But that attractor well? It's got a free parameter. And that parameter is what the AGI is optimizing for.

... (read more)
Reply
Which side of the AI safety community are you in?
TsviBT5h30

The OP is about two "camps" of people. Do you understand what camps are? Hopefully you can see that this indeed does induce the analog of "because the claim of fakeness is about the entirety of the image". They gain and direct funding, consensus, hiring, propaganda, vibes, parties, organizations, etc., approximately as a unit. Camp A is a 90% poison twinkie. The fact that you are trying to not process this is a problem.

Reply211
AI #139: The Overreach Machines
Nick_Tarleton6h40

none of this means Sam Altman shouldn’t be welcome at Lighthaven, and Holly clarifies that even she agrees on this

That is not my reading of the linked tweet (which just agrees that Lighthaven wasn't "dazzled"), and the opposite is my reading of this tweet and its replies.

Reply
Which side of the AI safety community are you in?
TsviBT6h20

Um, no, you responded to the OP with what sure seems like a proposed alternative split. The OP's split is about

people who self-identify as members of the AI safety community

I think you are making an actual mistake in your thinking, due to a significant gap in your thinking and not just a random thing, and with bad consequences, and I'm trying to draw your attention to it.

Reply
Which side of the AI safety community are you in?
Davidmanheim6h20

There's a huge difference between the types of cases, though. A 90% poisonous twinkie is certainly fine to call poisonous[1], but a 90% male groups isn't reasonable to call male. You said "if most people who would say they are in C are not actually working that way and are deceptively presenting as C," that seems far like the latter than the former, because "fake" implies the entire thing is fake[2].

  1. ^

    Though so is a 1% poisonous twinkie; perhaps the example should be a meal that is 90% protein would be a "protein meal" without implying there is no non-prote

... (read more)
Reply
Which side of the AI safety community are you in?
Michaël Trazzi6h*20

I was trying to map out disagreements between people who are concerned enough about AI risk.

Agreed that this represents only a fraction of the people who talk about AI risk, and that there are a lot of people who will use some of these arguments as false justifications for their support of racing.

EDIT: as TsviBT pointed out in his comment, OP is actually about people who self-identify as members of the AI Safety community. Given that, I think that the two splits I mentioned above are still useful models, since most people I end up meeting who self-identify... (read more)

Reply
Which side of the AI safety community are you in?
TsviBT6h40

I was "let's build it before someone evil", I've left that particular viewpoint behind since realizing how hard aligning it is.

It was empirically infeasible (for the general AGI x-risk technical milieu) to explain this to you faster than you trying it for yourself, and one might have reasonably expected you to have been generally culturally predisposed to be open to having this explained to you. If this information takes so much energy and time to be gained, that doesn't bode well for the epistemic soundness of whatever stance is currently being taken by the funder-attended vibe-consensus. How would you explain this to your past self much faster?

Reply
Contra-Zombies? Contra-Zombies!: Chalmers as a parallel to Hume
Shiva's Right Foot6h10

I had a very busy IRL day yesterday and have intended to respond to this.

While I am initially inclined to simply do what you ask out of kindness I am still convinced that I have no real reason to do so and therefore acceding here may portray me as a pushover. This really is an instance where some human neutral third party input to this dispute would be extremely helpful and I wish there was more of a culture online of such interventions. I would expect there to be such a culture here on Lesswrong, but perhaps not.

Nevertheless I did consult a non-human medi... (read more)

Reply
The Doomers Were Right
Jasnah Kholin6h61

Can anyone reading this truly deny that those warnings came true from the doom sayer's perspective?

yes. your arrow of causality look backwards to me - I don't see divorce destigmatization - > more divorce. in the divorce case it's clearly more divorce -> destigmatization. i don't remember where to find the posts about how the laws that allow divorce came after the spike in divorce, and not the other way around.

there is important point here. i only recently re-evaluate my opinion on TV and decided the doomers was right there. but it sure look to m... (read more)

Reply
Which side of the AI safety community are you in?
Quinn6h20

Part of the implementation choices might be alienating-- the day after I signed I saw in the announcement email yesterday "Let's take our future back from Big Tech." and maybe a lot of people, who work at large tech companies, who are on the fence don't like that brand of populism.

Reply
Which side of the AI safety community are you in?
TsviBT6h22

I think we're just trying to do different things here... I'm trying to describe empirical clusters of people / orgs, you're trying to describe positions, maybe? And I'm taking your descriptions as pointers to clusters of people, of the form "the cluster of people who say XYZ". I think my interpretation is appropriate here because there is so much importance-weighted abject insincerity in publicly stated positions regarding AGI X-risk that it just doesn't make much sense to focus on the stated positions as positions.

Like, the actual people at The Curve or w... (read more)

Reply
AI #139: The Overreach Machines
infinibot276h52

You can either keep them on a short leash and do code review, or you can

Is there a missing segment here? It doesn't seem like a stylistic segway to the next section.

Reply
AI #139: The Overreach Machines
StanislavKrym6h10

Gary Marcus offered Elon Musk 10:1 odds on the bet, offering to go up to $1 million dollars using Elon Musk’s definition of ‘capable of doing anything a human with a computer can do, but not smarter than all humans combined’, but I’m sure Elon Musk could hold out for 20:1 and he’d get it. By that definition, the chance Grok 5 will count seems very close to epsilon. No, just no.

Nitpick: we don't know what Musk's researchers actually did. If they found the actually capable neuralese architecture, then we are done.  But what is the probability that they ... (read more)

Reply
Any corrigibility naysayers outside of MIRI?
johnswentworth6h40

As for how that gets to "definitely can't": the problem above means that, even if we nominally have time to fiddle and test the system, iteration would not actually be able to fix the relevant problems. And so the situation is strategically equivalent to "we need to get it right on the first shot", at least for the core difficult parts (like e.g. understanding what we're even aiming for).

And as for why that's hard to the point of de-facto impossibility with current knowledge... try the ball-cup exercise, then consider the level of detailed understanding re... (read more)

Reply
The Doomers Were Right
Lucas Spailier6h123

While I agree at a basic level, this also seems like a motte-and-bailey.

There is clearly a vibe that all doomers have obviously always been wrong. The author is clearly trying to push back against that vibe. I too prefer arguing at 'motte' level, but vibes (baileys) matter, and pushing back against one should not require a long airtight argument that stands up to the stronger version of the claims being made. Even though I agree the stronger version would be better, that's true for both sides of any debate.

Reply
plex's Shortform
NunoSempere6h187

I mentioned this before, but that interface didn't allow for wide spreads, so the thing that you might be looking at is a plot of people's medians, not the whole distribution. In general Hypermind's user interface was so so shitty, and they paid so poorly, if at all, that I don't think it's fair to describe that screenshot as "superforecasters".

Reply1
The Doomers Were Right
cesspool6h91

That's comparing apples to oranges.  There are doomers and doomers.  I don't think the "doomers" predicting the Rapture or some other apocalypse are the same thing as the "doomers" predicting the moral decline of society.  The two categories overlap in many people, but they are distinct, and I think it's misleading to conflate them.  (Which is kind of a critique of the premise of the article as a whole--I would put the AI doomers in the former category, but the article only gives examples from the latter.)

The existential risk doomers hi... (read more)

Reply1
Beware unfinished bridges
Tiuto6h30

I had been thinking about the exact same topic when I read this article, only I was using bus routes in my analogy. I created a quick program to simulate these dynamics[1].

It's very simple, there is a grid of squares, let's say 100 by 100, each square has some other square randomly assigned as its goal. Then I generate some paths via random walks until some fraction of squares are paths. Then I check what fraction of squares are connected to their goal via a path.

Doing this we get the following s-curve:

The y-axis shows the fraction of squares that are able... (read more)

Reply
Which side of the AI safety community are you in?
Michaël Trazzi6h20

You make a valid point. Here's another framing that makes the tradeoff explicit:

  • Group A) "Alignment research is worth doing even though it might provide cover for racing"
  • Group B) "The cover problem is too severe. We should focus on race-stopping work instead"
Reply
Which side of the AI safety community are you in?
TsviBT6h50

I guess we could say "mostly fake", but also there's important senses in which "mostly fake" implies "fake simpliciter". E.g. a twinkie made of "mostly poison" is just "a poisonous twinkie". Often people do, and should, summarize things and then make decisions based on the summaries, e.g. "is it poison, or no" --> "can I eat it, or no". My guess is that the conditions under which it would make sense for you to treat someone as genuinely holding position C, e.g. for purposes of allocating funding to them, are currently met by approximately no one. I coul... (read more)

Reply
Resampling Conserves Redundancy (Approximately)
johnswentworth6h20

That seems like a cool idea for the mediation condition, but Isn't it trivial for the redundancy conditions?

Indeed, that specific form doesn't work for the redundancy conditions. We've been fiddling with it.

Reply
Any corrigibility naysayers outside of MIRI?
johnswentworth7h100

The anti-naturality problems are an issue, especially if you want to build the thing via standard RL-esque training, but they're not the first things which will kill you.

The story in the post you link is a pretty standard training story, and runs into the same immediate problems which standard training stories usually run into:

  • The humans will feed the system incorrect data.
  • Insofar as the system is capable, it will learn to predict the humans' own errors, as opposed to the thing the humans intended (and this will get worse as capabilities increase).
  • Insofar
... (read more)
Reply
Dead-switches as AI safety tools
Jesper L.7h10

I disagree a bit with your logic here. If 60 % of ChatGPT GPU is cut off as a result of one switch in just one datacenter, the whole model is reduced to something else than what it was. Users with simple queries won't notice (right away). But the model will get dumber instantly. 

(How will it copy its weights then?)

Reply
Dead-switches as AI safety tools
Jesper L.7h10

What do you think of the argument here (if you read that far) to build this into progenitor models? This idea does not apply to an SI, as I clarify in the text as well. 

 

Let's look at this from today. Current AI is unable to hide all deception from us. It seems reasonable to me that a switch would trigger before stealth SI is deployed. 

Reply
Differences in Alignment Behaviour between Single-Agent and Multi-Agent AI Systems
David Africa7h10

Which LLMs did you use (for judging, for generating narratives, for peers)? And how do you plan to measure alignment?

Reply
Penny's Hands
gwern7h70

The genre here is psychological horror fiction, and the style is first-person short story; so it's reminiscent of Edgar Allan Poe or Ted Chiang; but it's not clearly condensed or tightly edited the way those tend to be, and the narrator's style is prolix and euphuistic. From an editing perspective, I think the question I would have is to what extent this is a lack of editing & killing-your-darlings, and a deliberate unreliable-narrator stylistic choice in which the prose is trying to mirror the narrator's brute-force piano style or perhaps the dystonia... (read more)

Reply1
EU explained in 10 minutes
Martin Sustrik7h20

By the way, your comment shows one thing that's may not be obvious from the outside (and maybe even from the inside): There's a lot of people who are in favour of the European project even if they never say so or act on it in any way. And not because it is cool and sexy, it most definitely isn't, but partly because of the historic experience (every family has stories like yours) and partly because they see EU as a check on their national government, preventing it from going fully bonkers. That being said, this political capital is completely untapped.

Reply
Do One New Thing A Day To Solve Your Problems
Jasnah Kholin7h10

maybe i took all the low hanging fruit or something, but doing entire new thing every day is A LOT. like, the things i have to do and didn't, it's because they are hard and take more then 5 minutes. also, i can't even check if it worked, and i don't actually have so many things to do!

like, do you really expect to have 365 small things to do? because that suggestion sounds like applause lights to me - designed to be hard to say "actually, that's insane!", while being totally unrealistic.

also, i agree with Taylor. there are things like fixing small problem, ... (read more)

Reply
Which side of the AI safety community are you in?
Davidmanheim7h20

If you said "mostly bullshit" or "almost always disengenious" I wouldn't argue, but would still question whether it's actually a majority of people in group C, which I'm doubtful of, but very unsure about - but saying it is fake would usually mean it is not a real thing anyone believes, rather than meaning that the view is unusual or confused or wrong.

Closely related to: You Don't Exist, Duncan.

Reply
Load More