Wiki Contributions

Comments

Announcing the Inverse Scaling Prize ($250k Prize Pool)

Would the ability to deceive humans when specifically prompted to do so be considered an example? I would think that large LMs get better at devising false stories about the real world that people could not distinguish from true stories.

On A List of Lethalities

On the idea of "we can't just choose not to build AGI". It seems like much of the concern here is predicated on the idea that so many actors are not taking safety seriously, so someone will inevitably  build AGI when the technology has advanced sufficiently. 

I wonder if struggles with AIs that are strong enough to cause a disaster but not strong enough to win instantly may change this perception? I can imagine there being very little gap if any between those two types of AI if there is a hard takeoff, but to me it seems quite possible for there be some time at that stage. Some sort of small/moderate disaster with a less powerful AI might get all the relevant players to realize the danger. At that point, humans have done reasonably well at not doing things that seem very likely to destroy the world immediately (e.g. nuclear war). 

Though we've been less good at putting good safeguards in place to prevent it from happening. And even if all groups that could create AI agree to stop, eventually someone will think they know how to do it. And we still only get the one chance. 

All that is to say I don't think it's implausible that we'll be able to coordinate well enough to buy more time, though it's unclear whether that will do much to avoiding eventual doom. 

Fixed points and free will

I feel like a lot of the angst about free will boils down to conflicting intuitions. 

  1. It seems like we live in a universe of cause and effect, thus all my actions/choices are caused by past events.
  2. It feels like I get to actually make choices, so 1. obviously can't be right.

The way to reconcile these intuitions is to recognize that yes, all the decisions you make are in a sense predetermined, but a lot of what is determining those decisions is who you are and what sort of thing you would do in a particular circumstance. You are making decisions, that experience is not invalidated by a fully deterministic universe. It's just that you are who you are and you'll make the decision that you would make. 

Genetic Enhancement: a Strategy for Long(ish) AGI Timeline Worlds

That’s true, there was a huge amount of outrage even before those details came out however.

The Efficient LessWrong Hypothesis - Stock Investing Competition

I of course don’t have insider information. My stance is something close to Buffett’s advice “be fearful when others are greedy, and greedy when others are fearful”. I interpret that as basically that markets tend to be overly reactionary and if you go by fundamentals representing the value of a stock you can potentially outperform the market in the long run. To your questions, yes disaster may really occur, but my opinion is that these risks are not sufficient to pass up the value here. I’ll also note that Charlie munger has been acquiring a substantial stake in BABA, which makes me more confident in its value at its current price.

The Efficient LessWrong Hypothesis - Stock Investing Competition

Alibaba (BABA) - the stock price has been pulled down by fear about regulation, delisting, and most recently instability in China as it's zero covid policy fails. However, as far as I can tell, the price is insanely low for the amount of revenue Alibaba generates and the market share that it holds in China.

Genetic Enhancement: a Strategy for Long(ish) AGI Timeline Worlds

Current bioethics norms will strongly condemn this sort of research, which may make it challenging to pursue in the nearish term. The consensus is strongly against, which will make acquiring funding difficult and any human CRISPR editing is completely off the table for now. For example, He Jiankui CRISPR edited some babies in China to make them less susceptible to HIV and went to prison for it.

Strategies for keeping AIs narrow in the short term

Do I understand you correctly as endorsing something like: it doesn’t matter how narrow an optimization process is, if it becomes powerful enough and is not well aligned, it still ends in disaster

What I Was Thinking About Before Alignment

I’m not sure the problem in biology is decoding. At least not in the same sense it is with neural networks. I see the main difficulty in biology more one of mechanistic inference where a major roadblock may be getting better measurements of what is going on in cells over time rather some algorithm that’s just going to be able to overcome the fact that you’re getting both very high levels of molecular noise in biological data and single snapshots in time that are difficult to place in context. With a neural network you have the parameters and it seems reasonable to say you just need some math to make it more interpretable.

Whereas in biology I think we likely need both better measurements and better tools. I’m not sure the same tools would be particularly applicable to the ai interpretability problem either.

If, for example, I managed to create mathematical tools to reliably learn mechanistic dependencies between proteins and/or genes from high dimensional biological data sets, it’s not clear to me that would be easily applicable to extracting bayes nets from large neural networks.

I’m coming at this from a comp bio angle so it’s possible I’m just not seeing the connections well, having not worked in both fields.

Are there substantial research efforts towards aligning narrow AIs?

In general the observation from working in the field is that if you have a simple metric, people will figure out how to game it. So you need to build in a lot of safeguards, and you need to evolve all the time as the spammers/abusers evolve. There's no end point, no place where you think you're done, just an ever changing competition.

 

That's what I was trying to point at in regards to the problem not being patchable. It doesn't seem like there is some simple patch you can write, and then be done. A solution that would work more permanently seems to have some of the "impossible" character of AGI alignment and trying to solve it on that level seems like it could be valuable for AGI alignment researchers.

Load More