sanyer — LessWrong

What training data should developers filter to reduce risk from misaligned AI? An initial narrow proposal

Regarding results of empirical evaluations of AI’s scheming-relevant capabilities, I think we could do even better than simple removal by replacing the real results with synthetic results that intentionally mislead the AIs about their own capabilities. So, if the model is good at scheming, we could mislead it by making it believe it is bad at it, and vice-versa. I think this could be quite feasible, since rather than coming up with fully synthetic data that may be easy to spot as synthetic, you only need to replace some specific results.

China proposes new global AI cooperation organisation

sanyer3mo81

A link to the original article would be appreciated

Consider chilling out in 2028

sanyer4mo30

How many people are working on test-time learning? How feasible do you think it is?

jenn's Shortform

sanyer5mo32

I see. Why do you have this impression that the default algorithms would do this? Genuinely asking, since I haven't seen convincing evidence of this.

jenn's Shortform

sanyer5mo10

I don't know, the obviously wrong things you see on the internet seems to differ a lot based on your recommendation algorithm. The strawmanny sjw takes you list are mostly absent from my algorithm. In contrast, I see LOTS of absurd right-wing takes in my feed.

Guide To The Less Wrong Editor

sanyer5mo10

The links to subsections in the table of contents seem to be broken.

LessWrong Feed [new, now in beta]

sanyer5mo30

It's Galaxy A54.

I'm not sure how to share screenshots on mobile on LW 😅

LessWrong Feed [new, now in beta]

sanyer5mo32

The idea seems cool but the feed doesn't work well on my phone. It cuts the sides of the text which makes things unreadable. (I have a Samsung)

European Links (18.05.25)

sanyer6mo52

Now, the EU itself needs some reforms badly, namely, as Draghi report suggests, relaxing the regulation, but there seems no political will to do that. At least, last time I’ve checked I have still seen those annoying “accept cookies” banners alive and kicking.

This is not true; there is a lot of political will for deregulation and simplification (see e.g. here). Everyone is talking about it in Brussels.

I assume the point about "accept cookies" banners was a joke, but just in case it wasn't: it takes time for regulations to be changed, so the fact that we still see the "accept cookies" banners offers no evidence that the EU is not taking deregulation seriously (another question is, if getting rid of those banners or other GDPR rules would boost competitiveness; I suspect it won't).

Also, IMO the most important reforms we need are not about regulation, but about harmonizing standards across the EU and creating a true single market.

Wei Dai's Shortform

sanyer6mo10

I would expect higher competence in philosophy to reduce overcondidence, not increase it? The more you learn, the more you realize how much you don't know

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments