William_S's Shortform

William_S

William_S's Shortform

1 min read22nd Mar 202329 comments

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is a special post for quick takes by William_S. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

29 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:31 AM

[-]William_S5dΩ681578

I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to the release of an open source "transformer debugger" tool.
I resigned from OpenAI on February 15, 2024.

[-]habryka5dΩ14244

Thank you for your work there. Curious what specifically prompted you to post this now, presumably you leaving OpenAI and wanting to communicate that somehow?

[-]William_S5dΩ15320

No comment.

[-]habryka5dΩ18414

Can you confirm or deny whether you signed any NDA related to you leaving OpenAI?

(I would guess a "no comment" or lack of response or something to that degree implies a "yes" with reasonably high probability. Also, you might be interested in this link about the U.S. labor board deciding that NDA's offered during severance agreements that cover the existence of the NDA itself have been ruled unlawful by the National Labor Relations Board when deciding how to respond here)

[-]gwern5dΩ255955

I think it is safe to infer from the conspicuous and repeated silence by ex-OA employees when asked whether they signed a NDA which also included a gag order about the NDA, that there is in fact an NDA with a gag order in it, presumably tied to the OA LLC PPUs (which are not real equity and so probably even less protected than usual).

[-]Ben Pace5d1910

Does anyone know if it's typically the case that people under gag orders about their NDAs can talk to other people who they know signed the same NDAs? That is, if a bunch of people quit a company and all have signed self-silencing NDAs, are they normally allowed to talk to each other about why they quit and commiserate about the costs of their silence?

[-]O O4dΩ6130

Daniel K seems pretty open about his opinions and reasons for leaving. Did he not sign an NDA and thus gave up whatever PPUs he had?

[-]LawrenceC4dΩ7173

When I spoke to him a few weeks ago (a week after he left OAI), he had not signed an NDA at that point, so it seems likely that he hasn't.

[-]James Payor4dΩ352

By "gag order" do you mean just as a matter of private agreement, or something heavier-handed, with e.g. potential criminal consequences?

I have trouble understanding the absolute silence we seem to be having. There seem to be very few leaks, and all of them are very mild-mannered and are failing to build any consensus narrative that challenges OA's press in the public sphere.

Are people not able to share info over Signal or otherwise tolerate some risk here? It doesn't add up to me if the risk is just some chance of OA trying to then sue you to bankruptcy, especially since I think a lot of us would offer support in that case, and the media wouldn't paint OA in a good light for it.

I am confused. (And I grateful to William for at least saying this much, given the climate!)

[-]isabel3d2218

I would guess that there isn’t a clear smoking gun that people aren’t sharing because of NDAs, just a lot of more subtle problems that add up to leaving (and in some cases saying OpenAI isn’t being responsible etc).

This is consistent with the observation of the board firing Sam but not having a clear crossed line to point at for why they did it.

It’s usually easier to notice when the incentives are pointing somewhere bad than to explain what’s wrong with them, and it’s easier to notice when someone is being a bad actor than it is to articulate what they did wrong. (Both of these run a higher risk of false positives relative to more crisply articulatable problems.)

[-]Nate Showell4d8-4

The lack of leaks could just mean that there's nothing interesting to leak. Maybe William and others left OpenAI over run-of-the-mill office politics and there's nothing exceptional going on related to AI.

[-]Jackson Silver4d117

At least one of them has explicitly indicated they left because of AI safety concerns, and this thread seems to be insinuating some concern - Ilya Sutskever's conspicuous silence has become a meme, and Altman recently expressed that he is uncertain of Ilya's employment status. There still hasn't been any explanation for the boardroom drama last year.

If it was indeed run-of-the-mill office politics and all was well, then something to the effect of "our departures were unrelated, don't be so anxious about the world ending, we didn't see anything alarming at OpenAI" would obviously help a lot of people and also be a huge vote of confidence for OpenAI.

It seems more likely that there is some (vague?) concern but it's been overridden by tremendous legal/financial/peer motivations.

[-]Martín Soto4d40

What's PPU?

[-]MondSemmel4d92

From here:

Profit Participation Units (PPUs) represent a unique compensation method, distinct from traditional equity-based rewards. Unlike shares, stock options, or profit interests, PPUs don't confer ownership of the company; instead, they offer a contractual right to participate in the company's future profits.

[-]Linch5d330

(not a lawyer)

My layman's understanding is that managerial employees are excluded from that ruling, unfortunately. Which I think applies to William_S if I read his comment correctly. (See Pg 11, in the "Excluded" section in the linked pdf in your link)

[-]PhilosophicalSoul3d168

I am a lawyer.

I think one key point that is missing is this: regardless of whether the NDA and the subsequent gag order is legitimate or not; William would still have to spend thousands of dollars on a court case to rescue his rights. This sort of strong-arm litigation has become very common in the modern era. It's also just... very stressful. If you've just resigned from a company you probably used to love, you likely don't want to fish all of your old friends, bosses and colleagues into a court case.

Edit: also, if William left for reasons involving AGI safety - maybe entering into (what would likely be a very public) court case would be counteractive to their reason for leaving? You probably don't want to alarm the public by flavouring existential threats in legal jargon. American judges have the annoying tendency to valorise themselves as celebrities when confronting AI (see Musk v Open AI).

[-]Linch19h40

I can see some arguments in your direction but would tentatively guess the opposite.

[-]JenniferRM5d163

What are your timelines like? How long do YOU think we have left?

I know several CEOs of small AGI startups who seem to have gone crazy and told me that they are self inserts into this world, which is a simulation of their original self's creation. However, none of them talk about each other, and presumably at most one of them can be meaningfully right?

One AGI CEO hasn't gone THAT crazy (yet), but is quite sure that the November 2024 election will be meaningless because pivotal acts will have already occurred that make nation state elections visibly pointless.

Also I know many normies who can't really think probabilistically and mostly aren't worried at all about any of this... but one normy who can calculate is pretty sure that we have AT LEAST 12 years (possibly because his retirement plans won't be finalized until then). He also thinks that even systems as "mere" as TikTok will be banned before the November 2024 election because "elites aren't stupid".

I think I'm more likely to be better calibrated than any of these opinions, because most of them don't seem to focus very much on "hedging" or "thoughtful doubting", whereas my event space assigns non-zero probability to ensembles that contain such features of possible futures (including these specific scenarios).

[-]Mitchell_Porter4d108

Wondering why this has so many disagreement votes. Perhaps people don't like to see the serious topic of "how much time do we have left", alongside evidence that there's a population of AI entrepreneurs who are so far removed from consensus reality, that they now think they're living in a simulation.

(edit: The disagreement for @JenniferRM's comment was at something like -7. Two days later, it's at -2)

[-]JenniferRM3d84

For most of my comments, I'd almost be offended if I didn't say something surprising enough to get a "high interestingness, low agreement" voting response. Excluding speech acts, why even say things if your interlocutor or full audience can predict what you'll say?

And I usually don't offer full clean proofs in direct word. Anyone still pondering the text at the end, properly, shouldn't "vote to agree", right? So from my perspective... its fine and sorta even working as intended <3

However, also, this is currently the top-voted response to me, and if William_S himself reads it I hope he answers here, if not with text then (hopefully? even better?) with a link to a response elsewhere?

((EDIT: Re-reading everything above his, point, I notice that I totally left out the "basic take" that might go roughly like "Kurzweil, Altman, and Zuckerberg are right about compute hardware (not software or philosophy) being central, and there's a compute bottleneck rather than a compute overhang, so the speed of history will KEEP being about datacenter budgets and chip designs, and those happen on 6-to-18-month OODA loops that could actually fluctuate based on economic decisions, and therefore its maybe 2026, or 2028, or 2030, or even 2032 before things pop, depending on how and when billionaires and governments decide to spend money".))

Pulling honest posteriors from people who've "seen things we wouldn't believe" gives excellent material for trying to perform aumancy... work backwards from their posteriors to possible observations, and then forwards again, toward what might actually be true :-)

[-]mishka4d32

However, none of them talk about each other, and presumably at most one of them can be meaningfully right?

Why at most one of them can be meaningfully right?

Would not a simulation typically be "a multi-player game"?

(But yes, if they assume that their "original self" was the sole creator (?), then they would all be some kind of "clones" of that particular "original self". Which would surely increase the overall weirdness.)

[-]JenniferRM3d41

These are valid concerns! I presume that if "in the real timeline" there was a consortium of AGI CEOs who agreed to share costs on one run, and fiddled with their self-inserts, then they... would have coordinated more? (Or maybe they're trying to settle a bet on how the Singularity might counterfactually might have happened in the event of this or that person experiencing this or that coincidence? But in that case I don't think the self inserts would be allowed to say they're self inserts.)

Like why not re-roll the PRNG, to censor out the counterfactually simulable timelines that included me hearing from any of the REAL "self inserts of the consortium of AGI CEOS" (and so I only hear from "metaphysically spurious" CEOs)??

Or maybe the game engine itself would have contacted me somehow to ask me to "stop sticking causal quines in their simulation" and somehow I would have been induced by such contact to not publish this?

Mostly I presume AGAINST "coordinated AGI CEO stuff in the real timeline" along any of these lines because, as a type, they often "don't play well with others". Fucking oligarchs... maaaaaan.

It seems like a pretty normal thing, to me, for a person to naturally keep track of simulation concerns as a philosophic possibility (its kinda basic "high school theology" right?)... which might become one's "one track reality narrative" as a sort of "stress induced psychotic break away from a properly metaphysically agnostic mental posture"?

That's my current working psychological hypothesis, basically.

But to the degree that it happens more and more, I can't entirely shake the feeling that my probability distribution over "the time T of a pivotal acts occurring" (distinct from when I anticipate I'll learn that it happened which of course must be LATER than both T and later than now) shouldn't just include times in the past, but should actually be a distribution over complex numbers or something...

...but I don't even know how to do that math? At best I can sorta see how to fit it into exotic grammars where it "can have happened counterfactually" or so that it "will have counterfactually happened in a way that caused this factually possible recurrence" or whatever. Fucking "plausible SUBJECTIVE time travel", fucking shit up. It is so annoying.

Like... maybe every damn crazy AGI CEO's claims are all true except the ones that are mathematically false?

How the hell should I know? I haven't seen any not-plausibly-deniable miracles yet. (And all of the miracle reports I've heard were things I was pretty sure the Amazing Randi could have duplicated.)

All of this is to say, Hume hasn't fully betrayed me yet!

Mostly I'll hold off on performing normal updates until I see for myself, and hold off on performing logical updates until (again!) I see a valid proof for myself <3

[-]Lech Mazur2d21

I know several CEOs of small AGI startups who seem to have gone crazy and told me that they are self inserts into this world, which is a simulation of their original self's creation

Do you know if the origin of this idea for them was a psychedelic or dissociative trip? I'd give it at least even odds, with most of the remaining chances being meditation or Eastern religions...

[-]JenniferRM2d90

Wait, you know smart people who have NOT, at some point in their life: (1) taken a psychedelic NOR (2) meditated, NOR (3) thought about any of buddhism, jainism, hinduism, taoism, confucianisn, etc???

To be clear to naive readers: psychedelics are, in fact, non-trivially dangerous.

I personally worry I already have "an arguably-unfair and a probably-too-high share" of "shaman genes" and I don't feel I need exogenous sources of weirdness at this point.

But in the SF bay area (and places on the internet memetically downstream from IRL communities there) a lot of that is going around, memetically (in stories about) and perhaps mimetically (via monkey see, monkey do).

The first time you use a serious one you're likely getting a permanent modification to your personality (+0.5 stddev to your Openness?) and arguably/sorta each time you do a new one, or do a higher dose, or whatever, you've committed "1% of a personality suicide" by disrupting some of your most neurologically complex commitments.

To a first approximation my advice is simply "don't do it".

HOWEVER: this latter consideration actually suggests: anyone seriously and truly considering suicide should perhaps take a low dose psychedelic FIRST (with at least two loving tripsitters and due care) since it is also maybe/sorta "suicide" but it leaves a body behind that most people will think is still the same person and so they won't cry very much and so on?

To calibrate this perspective a bit, I also expect that even if cryonics works, it will also cause an unusually large amount of personality shift. A tolerable amount. An amount that leaves behind a personality that similar-enough-to-the-current-one-to-not-have-triggered-a-ship-of-theseus-violation-in-one-modification-cycle. Much more than a stressful day and then bad nightmares and a feeling of regret the next day, but weirder. With cryonics, you might wake up to some effects that are roughly equivalent to "having taken a potion of youthful rejuvenation, and not having the same birthmarks, and also learning that you're separated-by-disjoint-subjective-deaths from LOTS of people you loved when you experienced your first natural death" for example.This is a MUCH BIGGER CHANGE than just having a nightmare and a waking up with a change of heart (and most people don't have nightmares and changes of heart every night (at least: I don't and neither do most people I've asked)).

Remember, every improvement is a change, though not every change is an improvement. A good "epistemological practice" is sort of a idealized formal praxis for making yourself robust to "learning any true fact" and changing only in GOOD ways from such facts.

A good "axiological practice" (which I don't know of anyone working on except me (and I'm only doing it a tiny bit, not with my full mental budget)) is sort of an idealized formal praxis for making yourself robust to "humanely heartful emotional changes"(?) and changing only in <PROPERTY-NAME-TBD> ways from such events.

(Edited to add: Current best candidate name for this property is: "WISE" but maybe "healthy" works? (It depends on whether the Stoics or Nietzsche were "more objectively correct" maybe? The Stoics, after all, were erased and replaced by Platonism-For-The-Masses (AKA "Christianity") so if you think that "staying implemented in physics forever" is critically important then maybe "GRACEFUL" is the right word? (If someone says "vibe-alicious" or "flowful" or "active" or "strong" or "proud" (focusing on low latency unity achieved via subordination to simply and only power) then they are probably downstream of Heidegger and you should always be ready for them to change sides and submit to metaphorical Nazis, just as Heidegger subordinated himself to actual Nazis without really violating his philosophy at all.)))

I don't think that psychedelics fits neatly into EITHER category. Drugs in general are akin to wireheading, except wireheading is when something reaches into your brain to overload one or more of your positive-value-tracking-modules, (as a trivially semantically invalid shortcut to achieving positive value "out there" in the state-of-affairs that your tracking modules are trying to track) but actual humans have LOTS of <thing>-tracking-modules and culture and science barely have any RIGOROUS vocabulary for any them.

Note that many of these neurological <thing>-tracking-modules were evolved.

Also, many of them will probably be "like hands" in terms of AI's ability to model them.

This is part of why AI's should be existentially terrifying to anyone who is spiritually adept.

AI that sees the full set of causal paths to modifying human minds will be "like psychedelic drugs with coherent persistent agendas". Humans have basically zero cognitive security systems. Almost all security systems are culturally mediated, and then (absent complex interventions) lots of the brain stuff freezes in place around the age of puberty, and then other stuff freezes around 25, and so on. This is why we protect children from even TALKING to untrusted adults: they are too plastic and not savvy enough. (A good heuristic for the lowest level of "infohazard" is "anything you wouldn't talk about in front of a six year old".)

Humans are sorta like a bunch of unpatchable computers, exposing "ports" to the "internet", where each of our port numbers is simply a lightly salted semantic hash of an address into some random memory location that stores everything, including our operating system.

Your word for "drugs" and my word for "drugs" don't point to the same memory addresses in the computer's implementing our souls. Also our souls themselves don't even have the same nearby set of "documents" (because we just have different memories n'stuff)... but the word "drugs" is not just one of the ports... it is a port that deserves a LOT of security hardening.

The bible said ~"thou shalt not suffer a 'pharmakeia' to live" for REASONS.

[-]O O4d2-3

I assume timelines are fairly long or this isn’t safety related. I don’t see a point in keeping PPUs or even caring about NDA lawsuits which may or may not happen and would take years in a short timeline or doomed world.

[-]mishka4d54

I think having a probability distribution over timelines is the correct approach. Like, in the comment above:

I think I'm more likely to be better calibrated than any of these opinions, because most of them don't seem to focus very much on "hedging" or "thoughtful doubting", whereas my event space assigns non-zero probability to ensembles that contain such features of possible futures (including these specific scenarios).

[-]O O4d1210

Even in probabilistic terms, the evidence of OpenAI members respecting their NDAs makes it more likely that this was some sort of political infighting (EA related) than sub-year takeoff timelines. I would be open to a 1 year takeoff, I just don't see it happening given the evidence. OpenAI wouldn't need to talk about raising trillions of dollars, companies wouldn't be trying to commoditize their products, and the employees who quit OpenAI would speak up.

Political infighting is in general just more likely than very short timelines, which would go in counter of most prediction markets on the matter. Not to mention, given it's already happened with the firing of Sam Altman, it's far more likely to have happened again.

If there was a probability distribution of timelines, the current events indicate sub 3 year ones have negligible odds. If I am wrong about this, I implore the OpenAI employees to speak up. I don't think normies misunderstand probability distributions, they just usually tend not to care about unlikely events.

[-]mishka4d54

No, OpenAI (assuming that it is a well-defined entity) also uses a probability distribution over timelines.

(In reality, every member of its leadership has their own probability distribution, and this translates to OpenAI having a policy and behavior formulated approximately as if there is some resulting single probability distribution).

The important thing is, they are uncertain about timelines themselves (in part, because no one knows how perplexity translates to capabilities, in part, because there might be difference with respect to capabilities even with the same perplexity, if the underlying architectures are different (e.g. in-context learning might depend on architecture even with fixed perplexity, and we do see a stream of potentially very interesting architectural innovations recently), in part, because it's not clear how big is the potential of "harness"/"scaffolding", and so on).

This does not mean there is no political infighting. But it's on the background of them being correctly uncertain about true timelines...

Compute-wise, inference demands are huge and growing with popularity of the models (look how much Facebook did to make LLama 3 more inference-efficient).

So if they expect models to become useful enough for almost everyone to want to use them, they should worry about compute, assuming they do want to serve people like they say they do (I am not sure how this looks for very strong AI systems; they will probably be gradually expanding access, and the speed of expansion might depend).

[-]William_S1yΩ350

From discussion with Logan Riggs (Eleuther) who worked on the tuned lens: the tuned lens suggests that the residual stream at different layers go through some linear transformations and so aren’t directly comparable. This would interfere with a couple of methods for trying to understand neurons based on weights: 1) the embedding space view 2) calculating virtual weights between neurons in different layers.

However, we could try correcting these using the transformations learned by the tuned lens to translate between the residual stream at different layers, and maybe this would make these methods more effective. By default I think the tuned lens learns only the transformation needed to predict the output token but the method could be adapted to retrodict the input token from each layer as well, we’d need both. Code for tuned lens is at https://github.com/alignmentresearch/tuned-lens

Moderation Log