Wiki Contributions

Comments

Wup time to edit that : D

Got the "My experiences are universal" going on in here

Don’t think about big words or small words—think about which particular word you mean.

I think this is good advice for most people who are used to being forced to use big words by school systems, but I personally follow a different version of this.

I see compression as an important feature of communication, and there are always tradeoffs to be made between

  1. Making my sentence short
  2. Conveying exactly what I want to say.

And sometimes I settle with transferring a "good enough" version of my idea, because communicating all the hairy details takes too much time / energy / social credit. I'm always scared of taking too much of people's attention or overrunning their working memory.

  • AI is pretty safe: unaligned AGI has a mere 7% chance of causing doom, plus a further 7% chance of causing short term lock-in of something mediocre
  • Your opponent risks bad lock-in: If there’s a ‘lock-in’ of something mediocre, your opponent has a 5% chance of locking in something actively terrible, whereas you’ll always pick good mediocre lock-in world (and mediocre lock-ins are either 5% as good as utopia, -5% as good)
  • Your opponent risks messing up utopia: In the event of aligned AGI, you will reliably achieve the best outcome, whereas your opponent has a 5% chance of ending up in a ‘mediocre bad’ scenario then too.
  • Safety investment obliterates your chance of getting to AGI first: moving from no safety at all to full safety means you go from a 50% chance of being first to a 0% chance
  • Your opponent is racing: Your opponent is investing everything in capabilities and nothing in safety
  • Safety work helps others at a steep discount:  your safety work contributes 50% to the other player’s safety 

This is more a personal note / call for somebody to examine my thinking processes, but I've been thinking really hard about putting hardware security methods to work. Specifically, spreading knowledge far and wide about how to:

  1. allow hardware designers / manufacturers to have easy, total control over who uses their product for what for how much throughout the supply chain
  2. make it easy to secure AI related data (including e.g. model weights and architecture) and difficult to steal.

This sounds like it would improve every aspect of the racey-environment conditions, except:

Your opponent is racing: Your opponent is investing everything in capabilities and nothing in safety

The exact effect of this is unclear. On the one hand, if racey, zero-sum thinking actors learn that you're trying to "restrict" or "control" AI hardware supply, they'll totally amp up their efforts. On the other hand, you've also given them one more thing to worry about (their hardware supply).

I would love to get some frames on how to think about this.

Great! Now, lesswrong uses markdown syntax. Which means you can do

# Section

## SubSection

### Sub Sub Section

Consider using this in you LessWrong, and substack posts. This would greatly help readability and make it easier for you readers to engage with your work.

Welcome to LessWrong!

This post is missing a TL;DR in the beginning, and section titles (section TLDRs, basically) in between.

That sounds reasonable! Thanks for the explanation!

I'm new to alignment and I'm pretty clueless.

What's Ought's take on the "stop publishing all capabilities research" stance that e.g. Yudkowsky is taking in this tweet? https://twitter.com/ESYudkowsky/status/1557184416786423809

25,50,75:

I'm thinking, just like how to can infer whether it's normal or lognormal, we can use one of the bell curve shaped distribution that gives a sort of closest approximation.

More generally, it'd be awesome if there a way to get the max entropy distribution given a bunch of statistics like quantiles or nsamples with min and max.

For to(a,b), is there a way to specify other confidence intervals?

 

E.g. let's say I have the 25, 50, 75 percentiles, but not the 5 and 95 percentiles?

Load More