simeon_c - LessWrong

It might be a dumb question but aren't there major welfare concerns with assembling biorobots?

Thanks for asking! Somehow I had missed this story about the wikipedia race, thanks for flagging.

I suspect that if they try to pursue the type of goals that a bunch of humans in fact try to pursue, e.g. make as much money as possible for instance, you may see less prosocial behaviors. Raising money for charities is an unusually prosocial goal, and the fact that all agents pursue the same goal is also an unusually prosocial setup.

My pitch for the AI Village

simeon_c9d7-6

Seems right that it's overall net positive. And it does seem like a no-brainer to fund. So thanks for writing that up.

I still hope that the AI Digest team who run it also put some less cute goals and frames around what they report from agents' behavior. I would like to see their darker tendencies highlighted aswell, e.g. cheating, instrumental convergence etc. in a way which is not perceived as "aw, that's cute". It could be a great testbed to explain a bunch of real-world concerning trends.

New Endorsements for “If Anyone Builds It, Everyone Dies”

simeon_c14d3622

Consider making public a bar with the (approximate) number of pre-orders, with the 20 000 goal as end goal. Having explicit goals that everyone can optimize for can help getting a sense of whether it's worth investing marginal efforts and can be motivational for people to spread more etc.

simeon_c's Shortform

simeon_c1mo20

Agreed that those are complementary. I didn't mean to say that the factor I flagged is the only important one.

simeon_c's Shortform

simeon_c1mo40

Suggested reframing for judging AGI lab leaders: think less about what terminal values AGI lab leaders pursue and think more about how they trade-off power/instrumental goals with other values.

Claim 1: The end values of AGI lab leaders matter mostly if they win the AGI race and have crushed competition, but much less for all the decisions leading up there (i.e. from now to the post-AGI world).

Claim 1bis: Additionally, in the event where they have no competition and are ruling the world, even someone like Sam Altman seems to have mostly good values (e.g. see all his endeavours around fusion, world basic income etc.).

Claim 2: What matters the most during the AGI race (and before any DSA) is the propensity of an AGI lab leader to forego an opportunity to grab more power/resources in favor of other valuable things (e.g. safety, benefit-sharing etc.). The main reason for that is that at all points during the AGI race, and in particular late game, you can systematically get (a lot!) more expected power if you trade-off safety, governance or other valuable things. This is the main dynamic at play predictive of AGI labs obsessing over developing AI R&D first, of Sama's various moves detrimental to safety.

Corollary 2a: A corollary of that is that many leaders sympathetic to safety (including sama) are frequently pursuing Pareto-pushing safety interventions (i.e. interventions that don't reduce their power) such as good safety research etc. The main difficulties arise whenever safety trades off with capabilities development & power (which is unfortunately frequent).

ryan_greenblatt's Shortform

simeon_c1mo121

For the record, I think the importance of "intentions"/values of leaders of AGI labs is overstated. What matters the most in the context of AGI labs is the virtue / power-seeking trade-offs, i.e. the propensity to do dangerous moves (/burn the commons) to unilaterally grab more power (in pursuit of whatever value).

Stuff like this op-ed, broken promise of not meaningfully pushing the frontier, Anthropic's obsession & single focus on automating AI R&D, Dario's explicit calls to be the first to RSI AI or Anthropic's shady policy activity has provided ample evidence that their propensity to burn the commons to grab more power (probably in name of some values I would mostly agree with fwiw) is very high.

As a result, I'm now all-things-considered trusting Google DeepMind slightly more than Anthropic to do what's right for AI safety. Google, as a big corp, is less likely to do unilateral power grabbing moves (such as automating AI R&D asap to achieve a decisive strategic advantage), is more likely to comply with regulations, and is already fully independent to build AGI (compute / money / talent) so won't degrade further in terms of incentives; additionally D. Hassabis has been pretty consistent in his messaging about AI risks & AI policy, about the need for an IAEA/CERN for AI etc., Google has been mostly scaling up its safety efforts and has produced some of the best research on AI risk assessment (e.g. this excellent paper, or this one).

Alexander Gietelink Oldenziel's Shortform

simeon_c4mo20

I'm not 100% sure about the second factor but the first is definitely a big factor. There's no institution which is more dense in STEM talent than ENS to my knowledge, and elites there are extremely generalist compared to equivalent elites I've met in other countries like the US (e.g. MIT) for instance. The core of "Classes Préparatoires" is that it pushes even the world best people to grind like hell for 2 years, including weekends, every evenings etc.

ENS is the result of: push all your elite to grind like crazy for 2 years on a range of STEM topics, and then select the top 20 to 50.

Common misconceptions about OpenAI

simeon_c7mo3-6

250 upvotes is also crazy high. Another sign of the disastrous abilities of EA/LessWrong communities at character judgment.

The same is right now happening before our eyes on Anthropic. And similar crowds are as confidently asserting that this time they're really the good guys.

Should there be just one western AGI project?

simeon_c7mo2-1

I just skimmed but just wanted to flag that I like Bengio's proposal of one coordinated coalition that develops several AGIs in a coordinated fashion (e.g. training runs at the same time on their own clusters), which decreases the main downside of having one single AGI project (power concentration).

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments