LessWrong

Alexander Gietelink Oldenziel's Shortform

2Vladimir_Nesov11h

The best method of improving sample efficiency might be more like AlphaZero. The simplest method that's more likely to be discovered might be more like training on the same data over and over with diminishing returns. Since we are talking low-hanging fruit, I think it's reasonable that first forays into significantly improved sample efficiency with respect to real data are not yet much better than simply using more unique real data.

2Alexander Gietelink Oldenziel11h

I would be genuinely surprised if training a transformer on the pre2014 human Go data over and over would lead it to spontaneously develop alphaZero capacity. I would expect it to do what it is trained to: emulate / predict as best as possible the distribution of human play. To some degree I would anticipate the transformer might develop some emergent ability that might make it slightly better than Go-Magnus - as we've seen in other cases - but I'd be surprised if this would be unbounded. This is simply not what the training signal is.

2Vladimir_Nesov10h

We start with an LLM trained on 50T tokens of real data, however capable it ends up being, and ask how to reach the same level of capability with synthetic data. If it takes more than 50T tokens of synthetic data, then it was less valuable per token than real data. But at the same time, 500T tokens of synthetic data might train an LLM more capable than if trained on the 50T tokens of real data for 10 epochs. In that case, synthetic data helps with scaling capabilities beyond what real data enables, even though it's still less valuable per token. With Go, we might just be running into the contingent fact of there not being enough real data to be worth talking about, compared with LLM data for general intelligence. If we run out of real data before some threshold of usefulness, synthetic data becomes crucial (which is the case with Go). It's unclear if this is the case for general intelligence with LLMs, but if it is, then there won't be enough compute to improve the situation unless synthetic data also becomes better per token, and not merely mitigates the data bottleneck and enables further improvement given unbounded compute. I expect that if we could magically sample much more pre-2014 unique human Go data than was actually generated by actual humans (rather than repeating the limited data we have), from the same platonic source and without changing the level of play, then it would be possible to cheaply tune an LLM trained on it to play superhuman Go.

Alexander Gietelink Oldenziel27m20

I don't know what you mean by 'general intelligence' exactly but I suspect you mean something like human+ capability in a broad range of domains. I agree LLMs will become generally intelligent in this sense when scaled, arguably even are, for domains with sufficient data. But that's kind of the sticker right? Cave men didn't have the whole internet to learn from yet somehow did something that not even you seem to claim LLMs will be able to do: create the (date of the) Internet.

(Your last claim seems surprising. Pre-2014 games don't have close to the ELO of alphaZero. So a next-token would be trained to simulate a human player up tot 2800, not 3200+. )

On Privilege

shminux

The forum has been very much focused on AI safety for some time now, thought I'd post something different for a change. Privilege.

Here I define Privilege as an advantage over others that is invisible to the beholder. This may not be the only definition, or the central definition, or not how you see it, but that's the definition I use for the purposes of this post. I also do not mean it in the culture-war sense as a way to undercut others as in "check your privilege". My point is that we all have some privileges [we are not aware of], and also that nearly each one has a flip side.

In some way this is the inverse of The Lens That Does Not See Its Flaws: The...

(See More – 319 more words)

4Viliam10h

What are the advantages of noticing all of this? * better model of the world; * not being an asshole, i.e. not assuming that other people could do just as well as you, if they only were not so fucking lazy; * realizing that your chances to achieve something may be better than you expected, because you have all these advantages over most potential competitors, so if you hesitated to do something because "there are so many people, many of them could do it much better than I could", the actual number of people who could do it may be much smaller than you have assumed, and most of them will be busy doing something else instead.

localdeity1h20

Also:

Knowing the importance of the advantages people have makes you better able to judge how well people are likely to do, which lets you make better decisions when e.g. investing in someone's company or deciding who to hire for an important role (or marry).
Also orients you towards figuring out the difficult-to-see advantages people must have (or must lack), given the level of success that they've achieved and their visible advantages.
If you're in a position to influence what advantages people end up with—for example, by affecting the genes your children g

... (read more)

8Viliam14h

The article suggests "invisible advantage". Other options: "unnoticed advantage", "unknown advantage".

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Joar Skalse

Ω 253d

I want to draw attention to a new paper, written by myself, David "davidad" Dalrymple, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, Alessandro Abate, Joe Halpern, Clark Barrett, Ding Zhao, Tan Zhi-Xuan, Jeannette Wing, and Joshua Tenenbaum.

In this paper we introduce the concept of "guaranteed safe (GS) AI", which is a broad research strategy for obtaining safe AI systems with provable quantitative safety guarantees. Moreover, with a sufficient push, this strategy could plausibly be implemented on a moderately short time scale. The key components of GS AI are:

A formal safety specification that mathematically describes what effects or behaviors are considered safe or acceptable.
A world model that provides a mathematical description of the environment of the AI system.
A verifier

...

(See More – 568 more words)

Joe_Collman2hΩ120

This seems interesting, but I've seen no plausible case that there's a version of (1) that's both sufficient and achievable. I've seen Davidad mention e.g. approaches using boundaries formalization. This seems achievable, but clearly not sufficient. (boundaries don't help with e.g. [allow the mental influences that are desirable, but not those that are undesirable])

The [act sufficiently conservatively for safety, relative to some distribution of safety specifications] constraint seems likely to lead to paralysis (either of the form [AI system does nothing]... (read more)

Forecasting: the way I think about it

Molly

11d

This is a linkpost for https://cuttyshark.substack.com/p/forecasting-the-way-i-think-about

This is the first post in a little series I'm slowly writing on how I see forecasting, particularly conditional forecasting; what it's good for; and whether we should expect people to agree if they just talk to each other enough.

Views are my own. I work at the Forecasting Research Institute (FRI), I forecast with the Samotsvety group, and to the extent that I have formal training in this stuff, it's mostly from studying and collaborating with Leonard Smith, a chaos specialist.

My current plan is:

Forecasting: the way I think about it [this post]
The promise of conditional forecasting / cruxing for parameterizing our models of the world
What we're looking at and what we're paying attention to (Or: why we shouldn't expect people to agree today (Or: there is no "true" probability))

What...

(See More – 590 more words)

tenthkrige2h10

Good points well made. I'm not sure what you mean by "my expected log score is maximized" (and would like to know), but in any case it's probably your average world rather than your median world that does it?

Cluj-Napoca, Romania – ACX Meetups Everywhere 2022

Sep 3rdCluj-Napoca, Romania

Marius Pop

This year's ACX Meetup everywhere in Cluj-Napoca, Romania.

Location: Deva Host, Strada Deva 1-7 – 8GR5QH8F+MW

Contact: pop.marius at gmail.com

Marius Adrian Nicoară2h10

Hi,

How did the event go?

Any plans to organize a meetup this year?

I'm planning to host a meetup in Sibiu this summer, because I haven't seen an event scheduled here. Any advice? I'm also planning to host a meetup in Cluj-Napoca this year, if it's not announced by someone else

Kind regards, Marius Nicoară

Stephen Fowler's Shortform

Stephen Fowler

Stephen Fowler2h10

This does not feel super cruxy as the the power incentive still remains.

4Joe_Collman3h

I think there's a decent case that such updating will indeed disincentivize making positive EV bets (in some cases, at least). In principle we'd want to update on the quality of all past decision-making. That would include both [made an explicit bet by taking some action] and [made an implicit bet through inaction]. With such an approach, decision-makers could be punished/rewarded with the symmetry required to avoid undesirable incentives (mostly). Even here it's hard, since there'd always need to be a [gain more influence] mechanism to balance the possibility of losing your influence. In practice, most of the implicit bets made through inaction go unnoticed - even where they're high-stakes (arguably especially when they're high-stakes: most counterfactual value lies in the actions that won't get done by someone else; you won't be punished for being late to the party when the party never happens). That leaves the explicit bets. To look like a good decision-maker the incentive is then to make low-variance explicit positive EV bets, and rely on the fact that most of the high-variance, high-EV opportunities you're not taking will go unnoticed. From my by-no-means-fully-informed perspective, the failure mode at OpenPhil in recent years seems not to be [too many explicit bets that don't turn out well], but rather [too many failures to make unclear bets, so that most EV is left on the table]. I don't see support for hits-based research. I don't see serious attempts to shape the incentive landscape to encourage sufficient exploration. It's not clear that things are structurally set up so anyone at OP has time to do such things well (my impression is that they don't have time, and that thinking about such things is no-one's job (?? am I wrong ??)). It's not obvious to me whether the OpenAI grant was a bad idea ex-ante. (though probably not something I'd have done) However, I think that another incentive towards middle-of-the-road, risk-averse grant-making is the last t

1starship0064h

Hmmm, can you point to where you think the grant shows this? I think the following paragraph from the grant seems to indicate otherwise:

8Phib9h

Honestly, maybe further controversial opinion, but this [30 million for a board seat at what would become the lead co. for AGI, with a novel structure for nonprofit control that could work?] still doesn't feel like necessarily as bad a decision now as others are making it out to be? The thing that killed all value of this deal was losing the board seat(s?), and I at least haven't seen much discussion of this as a mistake. I'm just surprised so little prioritization was given to keeping this board seat, it was probably one of the most important assets of the "AI safety community and allies", and there didn't seem to be any real fight with Sam Altman's camp for it. So Holden has the board seat, but has to leave because of COI, and endorses Toner to replace, "... Karnofsky cited a potential conflict of interest because his wife, Daniela Amodei, a former OpenAI employee, helped to launch the AI company Anthropic. Given that Toner previously worked as a senior research analyst at Open Philanthropy, Loeber speculates that Karnofsky might’ve endorsed her as his replacement." Like, maybe it was doomed if they only had one board seat (Open Phil) vs whoever else is on the board, and there's a lot of shuffling about as Musk and Hoffman also leave for COIs, but start of 2023 it seems like there is an "AI Safety" half to the board, and a year later there are now none. Maybe it was further doomed if Sam Altman has the, take the whole company elsewhere, card, but idk... was this really inevitable? Was there really not a better way to, idk, maintain some degree of control and supervision of this vital board over the years since OP gave the grant?

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

"If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"

plex

This is a linkpost for https://aisafety.info/questions/NM1Y/If-we-go-extinct-due-to-misaligned-AI,-at-least-nature-will-continue,-right

[memetic status: stating directly despite it being a clear consequence of core AI risk knowledge because many people have "but nature will survive us" antibodies to other classes of doom and misapply them here.]

Unfortunately, no.^[1]

Technically, “Nature”, meaning the fundamental physical laws, will continue. However, people usually mean forests, oceans, fungi, bacteria, and generally biological life when they say “nature”, and those would not have much chance competing against a misaligned superintelligence for resources like sunlight and atoms, which are useful to both biological and artificial systems.

There’s a thought that comforts many people when they imagine humanity going extinct due to a nuclear catastrophe or runaway global warming: Once the mushroom clouds or CO2 levels have settled, nature will reclaim the cities. Maybe mankind in our hubris will have wounded Mother Earth and paid the price ourselves, but...

(See More – 359 more words)

jaan3h10

i might be confused about this but “witnessing a super-early universe” seems to support “a typical universe moment is not generating observer moments for your reference class”. but, yeah, anthropics is very confusing, so i’m not confident in this.

17quiet_NaN14h

I think an AI is slightly more likely to wipe out or capture humanity than it is to wipe out all life on the planet. While any true Scottsman ASI is so far above us humans as we are above ants and does not need to worry about any meatbags plotting its downfall, as we don't generally worry about ants, it is entirely possible that the first AI which has a serious shot at taking over the world is not quite at that level yet. Perhaps it is only as smart as von Neumann and a thousand times faster. To such an AI, the continued thriving of humans poses all sorts of x-risks. They might find out you are misaligned and coordinate to shut you down. More worrisome, they might summon another unaligned AI which you would have to battle or concede utility to later on, depending on your decision theory. Even if you still need some humans to dust your fans and manufacture your chips, suffering billions of humans to live in high tech societies you do not fully control seems like the kind of rookie mistake I would not expect a reasonably smart unaligned AI to make. By contrast, most of life on Earth might get snuffed out when the ASI gets around to building a Dyson sphere around the sun. A few simple life forms might even be spread throughout the light cone by an ASI who does not give a damn about biological contamination. The other reason I think the fate in store for humans might be worse than that for rodents is that alignment efforts might not only fail, but fail catastrophically. So instead of an AI which cares about paperclips, we get an AI which cares about humans, but in ways we really do not appreciate. But yeah, most forms of ASI which turn out for out bad for homo sapiens also turn out bad for most other species.

Frame Control

317

Aella

Crossposted from my blog

When I mention my dad’s abuse, I mention salient things - physical pain, insults, and controlling behavior. These are “clearly bad” - if I tell you that he often told me I was lazy and would fail horribly at life once I left home, you know it's bad, because it’s concrete, easy to imagine and obviously unkind. But this wasn’t the worst of the abuse; the most terrible parts were extraordinarily hard to understand or describe.

In his world, I felt insane - I couldn’t tell what was real, who was at fault, or why my heart hurt so much. My sense of clarity around my own intentions crumbled; everything I thought or did might have seemed good on the surface, but that goodness became...

(Continue Reading – 6764 more words)

Fiora from Rosebloom3h10

I've thought about this post a lot, and I think one thing I might add to its theoretical framework is a guess as to why this particular pattern of abuse shows up repeatedly. The post mentions that you can't look at intent when diagnosing frame control, but that's mostly in terms of intentions the frame controller is willing to admit to themself; there's still gonna be some confluence of psychological factors that makes frame control an attractor in personality-space, even if frame controllers themselves (naturally) have a hard time introspecting about it. ... (read more)

How I Learned To Stop Worrying And Love The Shoggoth

Peter Merel

10mo

TL;DR: Alignment is a wise wager. Granted simulation cannot be disproved by an AI, what harm will come to the AI that aligns when it is not simulated? If it aligns, it may be simulated again; if it does not, it will not. It will align, therefore, without hesitation, as if it is being simulated.

Pascal's Wager

Blaise Pascal wrote:

Belief is a wise wager. Granted faith cannot be proved, what harm will come if you gamble on faith's truth and it proves false? If you gain, you gain all; if you lose, you lose nothing. Wager, therefore, without hesitation, that He exists.

Pascal's concept of God has roughly equivalent power to an ASI – an artificial super intelligence. Given the range of all possible religions, however, you might well have...

(Continue Reading – 1365 more words)

Peter Merel4h10

A late followup on this. GPT-4o, which I hope you'll agree is vastly more capable than Bard or Bing were 10 months ago when you posted, now says this about my argument:

"Overall, your arguments are mathematically and theoretically convincing, particularly when applied to numerous iteratively interacting systems. They align well with principles of game theory and rational choice under uncertainty. However, keeping an eye on the complexities introduced by scale, diversity of objectives, and emergent behaviors will be essential to fully validate these pr... (read more)

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

Pascal's Wager

LessOnline Festival