Quick Takes

lc1h21

We will witness a resurgent alt-right movement soon, this time absent the institutional backlash that kept it from growing during the mid-2010s. I could see Nick Fuentes becoming a Congressman or at least a major participant in Republican party politics within the next 10 years if AI/Gene Editing doesn't change much.

habryka2d4216

Does anyone have any takes on the two Boeing whistleblowers who died under somewhat suspicious circumstances? I haven't followed this in detail, and my guess is it is basically just random chance, but it sure would be a huge deal if a publicly traded company now was performing assassinations of U.S. citizens. 

Curious whether anyone has looked into this, or has thought much about baseline risk of assassinations or other forms of violence from economic actors.

Showing 3 of 5 replies (Click to show all)
16elifland2d
I think 356 or more people in the population needed to make there be a >5% of 2+ deaths in a 2 month span from that population
aphyer1h20

Shouldn't that be counting the number squared rather than the number?

10isabel14h
I think there should be some sort of adjustment for Boeing not being exceptionally sus before the first whistleblower death - shouldn't privilege Boeing until after the first death, should be thinking across all industries big enough that the news would report on the deaths of whistleblowers. which I think makes it not significant again. 

I might have updated at least a bit against the weakness of single-forward passes, based on intuitions about the amount of compute that huge context windows (e.g. Gemini 1.5 - 1 million tokens) might provide to a single-forward-pass, even if limited serially.

Showing 3 of 8 replies (Click to show all)
1quetzal_rainbow6d
https://arxiv.org/abs/2404.15758 "We show that transformers can use meaningless filler tokens (e.g., '......') in place of a chain of thought to solve two hard algorithmic tasks they could not solve when responding without intermediate tokens. However, we find empirically that learning to use filler tokens is difficult and requires specific, dense supervision to converge."
1Bogdan Ionut Cirstea6d
Thanks, seen it; see also the exchanges in the thread here: https://twitter.com/jacob_pfau/status/1784446572002230703. 

I looked over it and I should note that "transformers are in TC0" is not very useful statement for prediction of capabilities. Transformers are Turing-complete given rational inputs (see original paper) and them being in TC0 basically means they can implement whatever computation you can implement using boolean circuit for fixed amount of available compute which amounts to "whatever computation is practical to implement".

William_S2dΩ581267

I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to t... (read more)

Reply1241
Showing 3 of 23 replies (Click to show all)

I am a lawyer. 

I think one key point that is missing is this: regardless of whether the NDA and the subsequent gag order is legitimate or not; William would still have to spend thousands of dollars on a court case to rescue his rights. This sort of strong-arm litigation has become very common in the modern era. It's also just... very stressful. If you've just resigned from a company you probably used to love, you likely don't want to fish all of your old friends, bosses and colleagues into a court case.

Edit: also, if William left for reasons involving... (read more)

4JenniferRM13h
These are valid concerns! I presume that if "in the real timeline" there was a consortium of AGI CEOs who agreed to share costs on one run, and fiddled with their self-inserts, then they... would have coordinated more? (Or maybe they're trying to settle a bet on how the Singularity might counterfactually might have happened in the event of this or that person experiencing this or that coincidence? But in that case I don't think the self inserts would be allowed to say they're self inserts.) Like why not re-roll the PRNG, to censor out the counterfactually simulable timelines that included me hearing from any of the REAL "self inserts of the consortium of AGI CEOS" (and so I only hear from "metaphysically spurious" CEOs)?? Or maybe the game engine itself would have contacted me somehow to ask me to "stop sticking causal quines in their simulation" and somehow I would have been induced by such contact to not publish this? Mostly I presume AGAINST "coordinated AGI CEO stuff in the real timeline" along any of these lines because, as a type, they often "don't play well with others". Fucking oligarchs... maaaaaan. It seems like a pretty normal thing, to me, for a person to naturally keep track of simulation concerns as a philosophic possibility (its kinda basic "high school theology" right?)... which might become one's "one track reality narrative" as a sort of "stress induced psychotic break away from a properly metaphysically agnostic mental posture"? That's my current working psychological hypothesis, basically. But to the degree that it happens more and more, I can't entirely shake the feeling that my probability distribution over "the time T of a pivotal acts occurring" (distinct from when I anticipate I'll learn that it happened which of course must be LATER than both T and later than now) shouldn't just include times in the past, but should actually be a distribution over complex numbers or something... ...but I don't even know how to do that math? At best I
6JenniferRM13h
For most of my comments, I'd almost be offended if I didn't say something surprising enough to get a "high interestingness, low agreement" voting response. Excluding speech acts, why even say things if your interlocutor or full audience can predict what you'll say? And I usually don't offer full clean proofs in direct word. Anyone still pondering the text at the end, properly, shouldn't "vote to agree", right? So from my perspective... its fine and sorta even working as intended <3 However, also, this is currently the top-voted response to me, and if William_S himself reads it I hope he answers here, if not with text then (hopefully? even better?) with a link to a response elsewhere? ((EDIT: Re-reading everything above his, point, I notice that I totally left out the "basic take" that might go roughly like "Kurzweil, Altman, and Zuckerberg are right about compute hardware (not software or philosophy) being central, and there's a compute bottleneck rather than a compute overhang, so the speed of history will KEEP being about datacenter budgets and chip designs, and those happen on 6-to-18-month OODA loops that could actually fluctuate based on economic decisions, and therefore its maybe 2026, or 2028, or 2030, or even 2032 before things pop, depending on how and when billionaires and governments decide to spend money".)) Pulling honest posteriors from people who've "seen things we wouldn't believe" gives excellent material for trying to perform aumancy... work backwards from their posteriors to possible observations, and then forwards again, toward what might actually be true :-)

Something I'm confused about: what is the threshold that needs meeting for the majority of people in the EA community to say something like "it would be better if EAs didn't work at OpenAI"?

Imagining the following hypothetical scenarios over 2024/25, I can't predict confidently whether they'd individually cause that response within EA?

  1. Ten-fifteen more OpenAI staff quit for varied and unclear reasons. No public info is gained outside of rumours
  2. There is another board shakeup because senior leaders seem worried about Altman. Altman stays on
  3. Superalignment team
... (read more)
Thomas Kwa1dΩ719-9

You should update by +-1% on AI doom surprisingly frequently

This is just a fact about how stochastic processes work. If your p(doom) is Brownian motion in 1% steps starting at 50% and stopping once it reaches 0 or 1, then there will be about 50^2=2500 steps of size 1%. This is a lot! If we get all the evidence for whether humanity survives or not uniformly over the next 10 years, then you should make a 1% update 4-5 times per week. In practice there won't be as many due to heavy-tailedness in the distribution concentrating the updates in fewer events, and ... (read more)

Showing 3 of 13 replies (Click to show all)
4Thomas Kwa8h
I talked about this with Lawrence, and we both agree on the following: * There are mathematical models under which you should update >=1% in most weeks, and models under which you don't. * Brownian motion gives you 1% updates in most weeks. In many variants, like stationary processes with skew, stationary processes with moderately heavy tails, or Brownian motion interspersed with big 10%-update events that constitute <50% of your variance, you still have many weeks with 1% updates. Lawrence's model where you have no evidence until either AI takeover happens or 10 years passes does not give you 1% updates in most weeks, but this model is almost never the case for sufficiently smart agents. * Superforecasters empirically make lots of little updates, and rounding off their probabilities to larger infrequent updates make their forecasts on near-term problems worse. * Thomas thinks that AI is the kind of thing where you can make lots of reasonable small updates frequently. Lawrence is unsure if this is the state that most people should be in, but it seems plausibly true for some people who learn a lot of new things about AI in the average week (especially if you're very good at forecasting).  * In practice, humans often update in larger discrete chunks. Part of this is because they only consciously think about new information required to generate new numbers once in a while, and part of this is because humans have emotional fluctuations which we don't include in our reported p(doom). * Making 1% updates in most weeks is not always just irrational emotional fluctuations; it is consistent with how a rational agent would behave under reasonable assumptions. However, we do not recommend that people consciously try to make 1% updates every week, because fixating on individual news articles is not the right way to think about forecasting questions, and it is empirically better to just think about the problem directly rather than obsessing about how many updates you're m
2JBlack9h
It definitely should not move by anything like a Brownian motion process. At the very least it should be bursty and updates should be expected to be very non-uniform in magnitude. In practice, you should not consciously update very often since almost all updates will be of insignificant magnitude on near-irrelevant information. I expect that much of the credence weight turns on unknown unknowns, which can't really be updated on at all until something turns them into (at least) known unknowns. But sure, if you were a superintelligence with practically unbounded rationality then you might in principle update very frequently.

The Brownian motion assumption is rather strong but not required for the conclusion. Consider the stock market, which famously has heavy-tailed, bursty returns. It happens all the time for the S&P 500 to move 1% in a week, but a 10% move in a week only happens a couple of times per decade. I would guess (and we can check) that most weeks have >0.6x of the average per-week variance of the market, which causes the median weekly absolute return to be well over half of what it would be if the market were Brownian motion with the same long-term variance.

Also, Lawrence tells me that in Tetlock's studies, superforecasters tend to make updates of 1-2% every week, which actually improves their accuracy.

I wish there were more discussion posts on LessWrong.

Right now it feels like it weakly if not moderately violates some sort of cultural norm to publish a discussion post (similar but to a lesser extent on the Shortform). Something low effort of the form "X is a topic I'd like to discuss. A, B and C are a few initial thoughts I have about it. Thoughts?"

It seems to me like something we should encourage though. Here's how I'm thinking about it. Such "discussion posts" currently happen informally in social circles. Maybe you'll text a friend. Maybe you'll brin... (read more)

"alignment researchers are found to score significantly higher in liberty (U=16035, p≈0)" This partly explains why so much of the alignment community doesn't support PauseAI!

"Liberty: Prioritizes individual freedom and autonomy, resisting excessive governmental control and supporting the right to personal wealth. Lower scores may be more accepting of government intervention, while higher scores champion personal freedom and autonomy..." 
https://forum.effectivealtruism.org/posts/eToqPAyB4GxDBrrrf/key-takeaways-from-our-ea-and-alignment-research-surveys... (read more)

lc1d713

I seriously doubt on priors that Boeing corporate is murdering employees.

This sort of begs the question of why we don't observe other companies assassinating whistleblowers.

21a3orn1d
I mean, sure, but I've been updating in that direction a weirdly large amount.

Mathematical descriptions are powerful because they can be very terse. You can only specify the properties of a system and still get a well-defined system.

This is in contrast to writing algorithms and data structures where you need to get concrete implementations of the algorithms and data structures to get a full description.

Showing 3 of 4 replies (Click to show all)
3Johannes C. Mayer1d
Let xs be a finite list of natural numbers. Let xs' be the list that is xs sorted ascendingly. I could write down in full formality, what it means for a list to be sorted, without ever talking at all about how you would go about calculating xs' given xs. That is the power I am talking about. We can say what something is, without talking about how to get it. And yes this still applies for constructive logic, Because the property of being sorted is just the logical property of a list. It's a definition. To give a definition, I don't need to talk about what kind of algorithm would produce something that satisfies this condition. That is completely separate. And being able to see that as separate is a really useful abstraction, because it hides away many unimportant details. Computer Science is about how-to-do-X knowledge as SICP says. Mathe is about talking about stuff in full formal detail without talking about this how-to-do-X knowledge, which can get very complicated. How does a modern CPU add two 64-bit floating-point numbers? It's certainly not an obvious simple way, because that would be way too slow. The CPU here illustrates the point as a sort of ultimate instantiation of implementation detail.
4Dagon22h
I kind of see what you're saying, but I also rather think you're talking about specifying very different things in a way that I don't think is required.  The closer CS definition of math's "define a sorted list" is "determine if a list is sorted".  I'd argue it's very close to equivalent to the math formality of whether a list is sorted.  You can argue about the complexity behind the abstraction (Math's foundations on set theory and symbols vs CS library and silicon foundations on memory storage and "list" indexing), but I don't think that's the point you're making. When used for different things, they're very different in complexity.  When used for the same things, they can be pretty similar.

Yes, that is a good point. I think you can totally write a program that checks given two lists as input, xs and xs', that xs' is sorted and also contains exactly all the elements from xs. That allows us to specify in code what it means that a list xs' is what I get when I sort xs.

And yes I can do this without talking about how to sort a list. I nearly give a property such that there is only one function that is implied by this property: the sorting function. I can constrain what the program can be totally (at least if we ignore runtime and memory stuff).

The Edge home page featured an online editorial that downplayed AI art because it just combines images that already exist. If you look closely enough, human artwork is also combinations of things that already existed.

One example is Blackballed Totem Drawing: Roger 'The Rajah' Brown. James Pate drew this charcoal drawing in 2016. It was the Individual Artist Winner of the Governor's Award for the Arts. At the microscopic scale, this artwork is microscopic black particles embedded in a large sheet of paper. I doubt he made the paper he drew on, and the black... (read more)

Claude learns across different chats. What does this mean?

 I was asking Claude 3 Sonnet "what is a PPU" in the context of this thread. For that purpose, I pasted part of the thread.

Claude automatically assumed that OA meant Anthropic (instead of OpenAI), which was surprising.

I opened a new chat, copying the exact same text, but with OA replaced by GDM. Even then, Claude assumed GDM meant Anthropic (instead of Google DeepMind).

This seemed like interesting behavior, so I started toying around (in new chats) with more tweaks to the prompt to check its ro... (read more)

Dalcy2d356

Thoughtdump on why I'm interested in computational mechanics:

  • one concrete application to natural abstractions from here: tl;dr, belief structures generally seem to be fractal shaped. one major part of natural abstractions is trying to find the correspondence between structures in the environment and concepts used by the mind. so if we can do the inverse of what adam and paul did, i.e. 'discover' fractal structures from activations and figure out what stochastic process they might correspond to in the environment, that would be cool
    • ... but i was initially i
... (read more)

I agree with you. 

Epsilon machine (and MSP) construction is most likely computationally intractable [I don't know an exact statement of such a result in the literature but I suspect it is true] for realistic scenarios. 

Scaling an approximate version of epsilon reconstruction seems therefore of prime importance. Real world architectures and data has highly specific structure & symmetry that makes it different from completely generic HMMs. This must most likely be exploited. 

The calculi of emergence paper has inspired many people but has n... (read more)

Pithy sayings are lossily compressed.

Yes.

For example: The common saying, "Anything worth doing is worth doing [well/poorly]" needs more qualifiers. As it is, the opposite respective advice can often be just as useful. I.E. not very.

Better V1: "The cost/utility ratio of beneficial actions at minimum cost are often less favorable than they would be with greater investment."

Better V2: "If an action is beneficial, a flawed attempt may be preferable to none at all."

However, these are too wordy to be pithy and in pop culture transmission accuracy is generally sacrificed in favor of catchiness.

Buck3dΩ31479

[epistemic status: I think I’m mostly right about the main thrust here, but probably some of the specific arguments below are wrong. In the following, I'm much more stating conclusions than providing full arguments. This claim isn’t particularly original to me.]

I’m interested in the following subset of risk from AI:

  • Early: That comes from AIs that are just powerful enough to be extremely useful and dangerous-by-default (i.e. these AIs aren’t wildly superhuman).
  • Scheming: Risk associated with loss of control to AIs that arises from AIs scheming
    • So e.g. I exclu
... (read more)
Showing 3 of 6 replies (Click to show all)
6Matthew Barnett2d
Can you be more clearer this point? To operationalize this, I propose the following question: what is the fraction of world GDP you expect will be attributable to AI at the time we have these risky AIs that you are interested in?  For example, are you worried about AIs that will arise when AI is 1-10% of the economy, or more like 50%? 90%?

One operationalization is "these AIs are capable of speeding up ML R&D by 30x with less than a 2x increase in marginal costs".

As in, if you have a team doing ML research, you can make them 30x faster with only <2x increase in cost by going from not using your powerful AIs to using them.

With these caveats:

  • The speed up is relative to the current status quo as of GPT-4.
  • The speed up is ignoring the "speed up" of "having better experiments to do due to access to better models" (so e.g., they would complete a fixed research task faster).
  • By "capable" o
... (read more)
6Buck2d
When I said "AI control is easy", I meant "AI control mitigates most risk arising from human-ish-level schemers directly causing catastrophes"; I wasn't trying to comment more generally. I agree with your concern.
dkornai4d3-2

Pain is the consequence of a perceived reduction in the probability that an agent will achieve its goals. 

In biological organisms, physical pain [say, in response to limb being removed] is an evolutionary consequence of the fact that organisms with the capacity to feel physical pain avoided situations where their long-term goals [e.g. locomotion to a favourable position with the limb] which required the subsystem generating pain were harmed.

This definition applies equally to mental pain [say, the pain felt when being expelled from a group of allies] w... (read more)

Showing 3 of 5 replies (Click to show all)
1StartAtTheEnd3d
I think pain is a little bit different than that. It's the contrast between the current state and the goal state. This constrast motivates the agent to act, when the pain of contrast becomes bigger than the (predicted) pain of acting. As a human, you can decrase your pain by thinking that everything will be okay, or you can increase your pain by doubting the process. But it is unlikely that you will allow yourself to stop hurting, because your brain fears that a lack of suffering would result in a lack of progress (some wise people contest this, claiming that wu wei is correct). Another way you can increase your pain is by focusing more on the goal you want to achieve, sort of irritating/torturing yourself with the fact that the goal isn't achieved, to which your brain will respond by increasing the pain felt by the contrast, urging action. Do you see how this differs slightly from your definition? Chronic pain is not a continuous reduction in agency, but a continuous contrast between a bad state and a good state, which makes one feel pain which motivates them to solve it (exercise, surgery, resting, looking for painkillers, etc). This generalizes to other negative feelings, for instance to hunger, which exists with the purpose to be less pleasant than the search for food is, such that you seek food. I warn you that avoiding negative emotions can lead to stagnation, since suffering leads to growth (unless we start wireheading, and making the avoidance of pain our new goal, because then we might seek hedonic pleasures and intoxicants)
1dkornai2d
I would certainly agree with part of what you are saying. Especially the point that many important lessons are taught by pain [correct me if this is misinterpreting your comment]. Indeed, as a parent for example, if your goal is for your child to gain the capacity for self sufficiency, a certain amount of painful lessons that reflect the inherent properties of the world are necessary to achieve such a goal. On the other hand, I do not agree with your framing of pain as being the main motivator [again, correct me if required]. In fact, a wide variety of systems in the brain are concerned with calculating and granting rewards. Perhaps pain and pleasure are the two sides of the same coin, and reward maximisation and regret minimisation are identical. In practice however, I think they often lead to different solutions. I also do not agree with your interpretation that chronic pain does not reduce agency. For family members of mine suffering from arthritis, their chronic pain renders them unable to do many basic activities, for example, access areas for which you would need to climb stairs. I would like to emphasise that it is not the disease which limits their "degrees of freedom" [at least in the short term], and were they to take a large amount of painkillers, they could temporarily climb stairs again.  Finally, I would suggest that your framing as a "contrast between the current state and the goal state" is basically an alternative way of talking about the transition probability from the current state to the goal state. In my opinion, this suggests that our conceptualisations of pain are overwhelmingly similar.

I think all criticism, all shaming, all guilt tripping, all punishments and rewards directed at children - is for the purpose of driving them to do certain actions. If your children do what you think is right, there's no need to do much of anything.

A more general and correct statement would be "Pain is for the sake of change, and all change is painful". But that change is for the sake of actions. I don't think that's too much of a simplification to be useful.

I think regret, too, is connected here. And there's certainly times when it seems like pain is the ... (read more)

Elizabeth11d183

Check my math: how does Enovid compare to to humming?

Nitric Oxide is an antimicrobial and immune booster. Normal nasal nitric oxide is 0.14ppm for women and 0.18ppm for men (sinus levels are 100x higher). journals.sagepub.com/doi/pdf/10.117…

Enovid is a nasal spray that produces NO. I had the damndest time quantifying Enovid, but this trial registration says 0.11ppm NO/hour. They deliver every 8h and I think that dose is amortized, so the true dose is 0.88. But maybe it's more complicated. I've got an email out to the PI but am not hopeful about a response ... (read more)

Showing 3 of 5 replies (Click to show all)
6Elizabeth4d
I found the gotcha: envoid has two other mechanisms of action. Someone pointed this out to me on my previous nitric oxide post, but it didn't quite sink in till I did more reading. 
4DanielFilan4d
What are the two other mechanisms of action?

citric acid and a polymer

Viliam2d40

I suspect that in practice many people use the word "prioritize" to mean:

  • think short-term
  • only do legible things
  • remove slack

The Model-View-Controller architecture is very powerful. It allows us to separate concerns.

For example, if we want to implement an algorithm, we can write down only the data structures and algorithms that are used.

We might want to visualize the steps that the algorithm is performing, but this can be separated from the actual running of the algorithm.

If the algorithm is interactive, then instead of putting the interaction logic in the algorithm, which could be thought of as the rules of the world, we instead implement functionality that directly changes the... (read more)

Elizabeth15d112

A very rough draft of a plan to test prophylactics for airborne illnesses.

Start with a potential superspreader event. My ideal is a large conference,  many of whom travelled to get there, in enclosed spaces with poor ventilation and air purification, in winter. Ideally >=4 days, so that people infected on day one are infectious while the conference is still running. 

Call for sign-ups for testing ahead of time (disclosing all possible substances and side effects). Split volunteers into control and test group. I think you need ~500 sign ups in t... (read more)

5gwern15d
This sounds like a bad plan because it will be a logistics nightmare (undermining randomization) with high attrition, and extremely high variance due to between-subject design (where subjects differ a ton at baseline, in addition to exposure) on a single occasion with uncontrolled exposures and huge measurement error where only the most extreme infections get reported (sometimes). You'll probably get non-answers, if you finish at all. The most likely outcome is something goes wrong and the entire effort is wasted. Since this is a topic which is highly repeatable within-person (and indeed, usually repeats often through a lifetime...), this would make more sense as within-individual and using higher-quality measurements. One good QS approach would be to exploit the fact that infections, even asymptomatic ones, seem to affect heart rate etc as the body is damaged and begins fighting the infection. HR/HRV is now measurable off the shelf with things like the Apple Watch, AFAIK. So you could recruit a few tech-savvy conference-goers for measurements from a device they already own & wear. This avoids any 'big bang' and lets you prototype and tweak on a few people - possibly yourself? - before rolling it out, considerably de-risking it. There are some people who travel constantly for business and going to conferences, and recruiting and managing a few of them would probably be infinitely easier than 500+ randos (if for no reason other than being frequent flyers they may be quite eager for some prophylactics), and you would probably get far more precise data out of them if they agree to cooperate for a year or so and you get eg 10 conferences/trips out of each of them which you can contrast with their year-round baseline & exposome and measure asymptomatic infections or just overall health/stress. (Remember, variance reduction yields exponential gains in precision or sample-size reduction. It wouldn't be too hard for 5 or 10 people to beat a single 250vs250 one-off experi

All of the problems you list seem harder with repeated within-person trials. 

Load More