PoignardAzur

Posts

Sorted by New

Wiki Contributions

Comments

Do you think there's some potential for applying the skills, logic, and values of the rationalist community to issues surrounding prison reform and helping predict better outcomes?

Ha! Of course not.

Well, no, the honest answer would be "I don't know, I don't have any personal experience in that domain". But the problems I have cited (lack of budget, the general population actively wanting conditions not to improve) can't be fixed with better data analysis.

From anecdotes I've had from civil servants, directors love new data analysis tools, because they promise to improve outcomes without a budget raise. Staff hates new data analysis tools because they represent more work for them without a budget raise, and they desperately want the budget raise.

I mean, yeah, rationality and thinking hard about things always helps on the margin, but it doesn't compensate for a lack of budget or political goodwill. The secret ingredients to make a reform work are money and time.

Good summary of beliefs I've had for a while now. I feel like I should come back to this article at some point to unpack some of the things it mentions.

I've tried StarCoder recently, though, and it's pretty impressive. I haven't yet tried to really stress-test it, but at the very least it can generate basic code with a parameter count way lower than Copilot's.

Similarly, do you thoughts on AISafety.info ?

Quick note on AISafety.info: I just stumbled on it and it's a great initiative.

I remember pitching an idea for an AI Safety FAQ (which I'm currently working on) to a friend at MIRI and him telling me "We don't have anything like this, it's a great idea, go for it!"; my reaction at the time was "Well I'm glad for the validation and also very scared that nobody has had the idea yet", so I'm glad to have been wrong about that.

I'll keep working on my article, though, because I think the FAQ you're writing is too vast and maybe won't quite have enough punch, it won't be compelling enough for most people.

Would love to chat with you about it at some point.

I think this is a subject where we'd probably need to hash out a dozen intermediary points (the whole "inferential distance" thing) before we could come close to a common understanding.

Anyway, yeah, I get the whole not-backing-down-to-bullies thing; and I get being willing to do something personally costly to avoid giving someone an incentive to walk over you.

But I do think you can reach a stage in a conversation, the kind that inspired the "someone's wrong on the internet" meme, where all that game theory logic stops making sense and the only winning move is to stop playing.

Like, after a dozen back-and-forths between a few stubborn people who absolutely refuse to cede any ground, especially people who don't think they're wrong or see themselves as bullies... what do you really win by continuing the thread? Do you really impart outside observers with a feeling that "Duncan sure seems right in his counter-counter-counter-counter-rebuttal, I should emulate him" if you engage the other person point-by-point? Would you really encourage a culture of bullying and using-politeness-norms-to-impose-bad-behavior if you instead said "I don't think this conversation is productive, I'll stop now"?

It's like... if you play an iterated prisoner's dilemma, and every player's strategy is "tit-for-tat, always, no forgiveness", and there's any non-zero likelihood that someone presses the "defect" button by accident, then over a sufficient period of time the steady state will always be "everybody defects, forever". (The analogy isn't perfect, but it's an example of how game theory changes when you play the same game over lots of iterations)

(And yes, I do understand that forgiveness can be exploited in an iterated prisoner's dilemma.)

My objection is that it doesn't distinguish between [unpleasant fights that really should in fact be had] from [unpleasant fights that shouldn't].

Again, I don't think I have a sufficiently short inferential distance to convince you of anything, but my general vibe is that, as a debate gets longer, the line between the two starts to disappear.

It's like... Okay, another crappy metaphor is, a debate is like photocopying a sheet of paper, and adding notes to it. At first you have a very clean paper with legible things drawn on it. But as it progresses, you have a photocopy of a photocopy of a photocopy, you end up with something that has more noise from the photocopying artifacts than signal from what anybody wrote on it twelve iterations ago.

At that point, no matter how much the fight should be had, you're not waging it efficiently by participating.

I don't know much of the prison system in France, but your description definitely hit the points I was familiar with: the overcrowding, the general resentment the population has for any measure of dignity the system can give to inmates, the endemic lack of budget, and the magistrates trying to make the system work despite a severe lack of good options.

Good writeup.

I mean, seeing some of those discussions thread Duncan and others were involved in... I'd say it's pretty bad?

To me at least, it felt like the threads were incredibly toxic given how non-toxic this community usually is.

(Coming here from the Duncan-and-Said discussion)

I love the term "demon thread". Feels like a good example of what Duncan calls a "sazen", as in a word for a concept that I've had in mind for a while (discussion threads that naturally escalate despite the best efforts of everyone involved), but having a word for it makes the concept a lot more clear in my mind.

I think this is extremely standard, central LW skepticism in its healthy form.

Some things those comments do not do: [...]

I think that's a very interesting list of points. I didn't like the essay at all, and the message didn't feel right to me, but this post right here makes me a lot more sympathetic to it.

(Which is kind of ironic; you say this comment is dashed off, and you presumably spent a lot more time on the essay; but I'd argue the comment conveys a lot more useful information.)

It feels like the implicit message here is "And therefore we might coordinate around an alignment solution where all major actors agree to only train NNs that respect certain rules", which... really doesn't seem realistic, for a million reasons?

Like, even assuming major powers can agree to an "AI non-proliferation treaty" with specific metrics, individual people could still bypass the treaty with decentralized GPU networks. Rogue countries could buy enough GPUs to train an AGI, disable the verification hardware and go "What are you gonna do, invade us?", under the assumption that going to war over AI safety is not going to be politically palatable. Companies could technically respect the agreed-upon rules but violate the spirit in ways that can't be detected by automated hardware. Or they could train a perfectly-aligned AI on compliant hardware, then fine-tune it in non-aligned ways on non-compliant hardware for a fraction of the initial cost.

Anyway, my point is: any analysis of a "restrict all compute everywhere" strategy should start by examining what it actually looks like to implement that strategy, what the political incentives are, and how resistant that strategy will be to everyone on the internet trying to break it.

It feels like the author or this paper haven't even begun to do that work.

Load More