Wiki Contributions


General alignment plus human values, or alignment via human values?

I agree with you that utility-maximizing "maximum optimization power" AIs need to have some knowledge of human values to be safe - at least enough to avoid bad side effects.

On the other hand, I think that when you have an AI that can safely create 2 copies of the same strawberry, there might still be problems to solve that the AI might not already be solved at that point - like how to aggregate preferences of various people, how to extrapolate human values into weird situations, etc.

On the other other hand, some alignment problems give me the impression that they're independent of human values - like mesa-optimization or "Look where I'm pointing, not at my finger".

Maybe "to what degree is solving this subproblem of alignment necessary to have a safe strawberry cloner" might be an interesting distinction.

The Machine that Broke My Heart

I'm obese and struggle with weight loss, so this is a particularly sad story to hear. My experience makes me think a lot of my issues could be improved just by having someone, like, standing 24/7 by my side going "hey" when I go to buy ice cream, stress-eat etc.

Would you be willing to make the pieces you made available to someone who wanted to pick up where you ended? I'd probably not be that person, though (because don't have the spoons).

The principle of no non-Apologies

I wrote it as the sort of advice that I think might have been useful to me a couple years back, and to counteract the specific issue of "getting cornered to into conceding that you messed up though you don't believe you messed up". I think it's good advice for people-like-past-me, but as a targeted intervention, maybe a TAP like "about to apologize for something that was not a mess-up --> don't apologize unless you mean it". (That is a salient trigger for me, because "I'm apologizing for something that was not a mess-up" has a distinctly different quality for me from "I'm apologizing for something I messed up" - sort of like "appeasing someone angry" vs. "asking for forgiveness".)

It's a bit unfortunate that English uses the words "I'm sorry" to express for what many languages have 2 distinct terms: "I apologize for messing up" and "I sympathize".

in a conversation where someone has recently been hurt, it is often the wrong time to be coldly pedantic.

Yeah. I haven't outright banned the words "I'm sorry" from my normal vocabulary. I will often say "I'm sorry that you're going through this" when it's contextually obvious that I'm not apologizing.

If someone is having a hard time caused by me, but I believe that I did not act wrongly (imagine scenarios like giving people negative feedback, breakups, defending boundaries, etc.), I avoid saying "I'm sorry", though I might say something like "I wish you weren't suffering" or "I understand this must hurt" or such. In these situations "I'm sorry" has the danger of being heard as "I apologize" or "I was wrong to act this way", and it's important to be able to stand your ground while e.g. giving people negative feedback, breaking up, defending your boundaries, etc.

The principle of no non-Apologies

You're right, that didn't occur to me to mention. (My native language separate idioms for that use.)

Growth mindset for better sex

Yeah, you're right - I'm equivocating between learning from "feedback from listener" and "feedback from master of the skill". Thanks for the links, I'll put them on The List.

(Hmm, now that I'm seeing them on Goodreads they seem to be about male-female-sex-for-male-readers, which lowers their value for me as I'm pan, but then, writing gender-general sex advice is probably harder than specific-combination advice...)

Growth mindset for better sex

Yep, some people won't share that enthusiasm, and I guess that's okay-ish. Would make long-term things harder than necessary though.

You are an optimizer. Act like it!

When I was thinking about this, what I had in mind was "Be a smart optimizer. If the best use of your resources right now is to go slack off for a day to regain energy, do that. Better than to have 15 productive minutes and then crash."

I am really rarely in "optimizer mode", and wrote this in a moment of inspiration.

My Anki patterns

Hi Pablo. Thanks for the link to the formula, I did not know someone already looked into it. At some point I estimated that it cost me ~250 s to learn a card in the first year.

In the last year, it looks like I spent ~1 hour per day reviewing on average. When I'm on my desktop computer, I'll share the Anki statistics PDF.

I think one reason the formula might be underestimating is because I keep expanding my deck over time, so there's a mix of newer and older cards.

I am also a bit concerned that I might be adding cards not always for what's most useful to learn, but for what's easiest to Ankify. So recently even though I've been wanting to study more ML papers, I've been mostly adding programming because it's easier... :/

My Anki patterns

It's aestethics but more aestetic :)

Load More