Wiki Contributions

Load More

Comments

I'm pretty sure that at some level what sorts of things your brain spits out into your consciousness and how useful that information is in the given situation, is something that you can't fundamentally change. I expect this to be a hard-coded algorithm

Tune Your Cognitive Strategies proports to offer a technique which can improve that class of algorithm significantly.

Edit: Oh, no, you were meaning a different thing, and this probably goes into the inputs to the algorithm category?

Your probabilities are not independent, your estimates mostly flow from a world model which seem to me to be flatly and clearly wrong.

The plainest examples seem to be assigning

We invent a way for AGIs to learn faster than humans40%
AGI inference costs drop below $25/hr (per human equivalent)16%

despite current models learning vastly faster than humans (training time of LLMs is not a human lifetime, and covers vastly more data) and the current nearing AGI and inference being dramatically cheaper and plummeting with algorithmic improvements. There is a general factor of progress, where progress leads to more progress, which you seem to be missing in the positive factors. For the negative, derailment that delays enough to push us out that far needs to be extreme, on the order of a full-out nuclear exchange, given more reasonable models of progress.

I'll leave you with Yud's preemptive reply:

Taking a bunch of number and multiplying them together causes errors to stack, especially when those errors are correlated.

Nice! Glad to see more funding options entering the space, and excited to see the S-process rolled out to more grantmakers.

Added you to the map of AI existential safety:


One thing which might be nice as part of improving grantee experience would be being able to submit applications as a google doc (with a template which gives the sections you list) rather than just by form. This increases re-usability and decreases stress, as it's easy to make updates later on so it's less of a worry that you ended up missing something crucial. Might be more hassle than it's worth though, depending on how entangled your backend is with airtable.

Cool! Feel free to add it with the form

that was me for context:

core claim seems reasonable and worth testing, though I'm not very hopeful that it will reliably scale through the sharp left turn

my guesses the intuitions don't hold in the new domain, and radical superintelligence requires intuitions that you can't develop on relatively weak systems, but it's a source of data for our intuition models which might help with other stuff so seems reasonable to attempt.

Meta's previous LLM, OPT-175B, seemed good by benchmarks but was widely agreed to be much, much worse than GPT-3 (not even necessarily better than GPT-Neo-20b). It's an informed guess, not a random dunk, and does leave open the possibility that they're turned it around and have a great model this time rather than something which goodharts the benchmarks.

This is a Heuristic That Almost Always Works, and it's the one most likely to cut off our chances of solving alignment. Almost all clever schemes are doomed, but if we as a community let that meme stop us from assessing the object level question of how (and whether!) each clever scheme is doomed then we are guaranteed not to find one.

Security mindset means look for flaws, not assume all plans are so doomed you don't need to look.

If this is, in fact, a utility function which if followed would lead to a good future, that is concrete progress and lays out a new set of true names as a win condition. Not a solution, we can't train AIs with arbitrary goals, but it's progress in the same way that quantilizers was progress on mild optimization.

Not inspired by them, no. Those did not have, as far as I'm aware, a clear outlet for use of the outputs. We have a whole platform we've been building towards for three years (starting on the FAQ long before those contests), and the ability to point large numbers of people at that platform once it has great content thanks to Rob Miles.

As I said over on your Discord, this feels like it has a shard of hope, and the kind of thing that could plausibly work if we could hand AIs utility functions.

I'd be interested to see the explicit breakdown of the true names you need for this proposal.

Agreed, incentives probably block this from being picked up by megacorps. I had thought to try and get Musk's twitter to adopt it at one point when he was talking about bots a lot, it would be very effective, but doesn't allow rent extraction in the same way the solution he settled on (paid twitter blue).

Websites which have the slack to allow users to improve their experience even if it costs engagement might be better adopters, LessWrong has shown they will do this with e.g. batching karma daily by default to avoid dopamine addiction.

Load More