G Gordon Worley III

Director of Research at PAISRI


Zen and Rationality
Formal Alignment
Map and Territory Cross-Posts
Phenomenological AI Alignment


Specializing in Problems We Don't Understand

Also makes me think of TRIZ. I don't really understand how to use it that well or even know if it produces useful results, but I know it's popular within the Russosphere (or at least more popular there than anywhere else).

A New Center? [Politics] [Wishful Thinking]

Political polarization in the USA has been increasing for decades, and has become quite severe. This may have a variety of causes, but it seems highly probable that the internet has played a large role, by facilitating the toxoplasma of rage to an unprecedented degree.

Contra the idea that the internet is to blame, polarization seems historically to be the "natural" state in both the USA and elsewhere. To get less of it you need specific mechanism that have a moderating effect.

For a long time in the US this was a combination of progressive Republicans (Whigs and abolitionists) and regressive Democrats (Dixiecrats) that caused neither major party to be able to form especially polarized policy positions. Once the Civil Rights Act and Roe v. Wade drove Dixiecrats out of the Democratic party and progressives out of the Republican party, respectively, the parties became able to align more on policy.

So extending this observation, rather than a new center, maybe what we need to get less polarization is something to hold the parties together along some line that's orthogonal to policy preferences such that both parties must tolerate a wide range of opinions. I'm not sure how to do that, as the above situation was created by the Civil War and Reconstruction that made variously the Republican and Democrat parties unacceptable to certain voters (like former slaveholders and abolitionists) and it was only after a hundred years that identification with or against the "Party of Lincoln" melted away enough to allow a shift.

Maybe your new center idea could cause this, but I'm not reading in it a strong enough coordination mechanism to overcome the nature tendency for parties to align in opposite directions.

People Will Listen

My advice is to accept that 'haters are gonna hate' and just take the hit. Make your arguments as clear and your advice as easy to follow as possible. But understand that no matter what you do, if you tell people to buy bitcoin at $230, the top comment might be critical. Some people will listen and benefit.

I've just been thinking about this with respect to two posts I recently authored. First I wrote "Forcing yourself to keep your identity small is self-harm" and this got a bunch of negative response (e.g. currently a score of 17 with 24 votes; my guess based on watching things come in is that it's close to 50% downvotes). In response I wrote "Forcing Yourself is Self Harm, or Don't Goodhart Yourself" and so far it's doing "better" by some measures (score of 25 right now, but with only 11 votes, all positive as best I can tell).

The thing is both posts say exactly the same thing other than that the first post is vary concretely about a particular case while the latter is a general article that covers the original article as a special case. I basically wrote the second version by taking the original text and modifying it to be explicitly generalized rather than just about one case.

Now if I ask myself which one I think is better, I actually think it's the first one even though the latter is better received in terms of karma. The second one lacks teeth and I think it's too easy to read it and not really get what it's saying in a concrete way. The reader might nod along saying "ah, yes, sage advice, I will follow it" and then promptly fail to integrate it into even a single place where it matters, whereas the former is very in your face about a single place where it matters and confronts the reader to consider that they may have been screwing up at doing a thing they value.

I like this kind of stuff that confronts readers because, although it may draw greater controversy, it also seems more likely to land for the person who will benefit from reading it, and managing criticism/downvotes only matters insofar as I draw too much negative attention and negatively impact the visibility of the post to people who would have benefitted from having seen it in a world where it was less criticized and less downvoted.

Of course in this isolated case of two articles there are confounding factors. For example, maybe people "came around" on my arguments by the time the second post came out since they saw the first one, or maybe more people just ignored the second post since it looked so much like the first one. But I've noticed this sort of trend over and over in my own writing and the writing of others: saying something direct that challenges the reader will draw the ire of readers who dislike having been challenged on something they hold dear, and saying the same thing in a less direct way that avoids triggering their displeasure also is actually worse because it less well lands for anyone and the people who were going to criticise it now don't but without that meaning anything.

"Taking your environment as object" vs "Being subject to your environment"

This post points at a core of why I like to talk about the subject-object relationship with respect to developmental psychology: the shifting of things from one side of the lens of intentionality to the other seems to be the key driver of development.

"Taking your environment as object" vs "Being subject to your environment"

There's some complexity here because English offers two words here, "subject" and "object" that can be used somewhat interchangeably in some situations but in most situations we have some notion that "subject" is on the left/upstream side of the causal arrow and "object" is on the right/downstream side. However Ben's reuse of "subject" by shifting it from actor ("subject to") to the acted upon ("as subject") seems mostly poetic and a reasonable alternative to talking about object.

Of course, because English is noun-focused, it's rather nice to have two different nouns for these concepts rather than having to point to them by using two different verb phrases as Ben does here.

I have my own mild preferences around using standard phrasing to trigger in people associations with that common body of work built around those standards, but regardless I don't think anything in the post is actually at odds with standard phrasing, just different and, to my ear, equally clear, even if I have no intention of ever copying it.

Forcing Yourself is Self Harm, or Don't Goodhart Yourself

Take your illustrative story.  I'd say the problem here is not that the person is trying to focus on the narrow area of increasing productivity.  It's that they picked a bad metric and a bad way of continual measuring themselves against the metric.  The story just kind of glosses over what I would say is the most important part!

I'd say that 65%-75% of the problem this person has is that they apparently didn't seriously think about this stuff before hand and pre-commit to a good strategy for measurement.

The person who looks and says "I only wrote 100 words last hour?!??!" kind of reminds me of the investor checking their stock prices every day.

For this person three months or six months or a year might be a better time frame for checking how they're doing.  Regardless, the main point I want to make is that how well this person would be able to improve themselves in this area while maintaining their well being is largely dependent upon making good decisions on this very important question.

This is one of the weird issues with what I see as the problem I'm trying to illustrate with the story and the limitations of telling a single story about it.

What you say is true, but it's a reduction of the problem to be less bad by applying weaker optimization pressure rather than an actual elimination of the problem. Weak Goodharting is still Goodharting and it will still, eventually, subtly screw you up.

This post is also advice, and so aimed mostly at folks less like you and more like the kind of person who doesn't realize they're actively making their life worse rather than better by trying too hard.

A System For Evolving Increasingly General Artificial Intelligence From Current Technologies

Have you thought much about the safety/alignment aspects of this approach. This seems very susceptible to Goodharting.

Testing The Natural Abstraction Hypothesis: Project Intro

Nice! From my perspective this would be pretty exciting because, if natural abstractions exist, it solves at least some of the inference problem I view at the root of solving alignment, i.e. how do you know that the AI really understands you/humans and isn't misunderstanding you/humans in some way that looks like it does understand from the outside but it doesn't. Although I phrased this in terms of reified experiences (noemata/qualia as a generalization of axia), abstractions are essentially the same thing in more familiar language, so I'm quite excited for the possibility that we can prove that we may be able to say something about the noemata/qualia/axia of minds other than our own beyond simply taking for granted that other minds share some commonality with ours (which works well for thinking about other humans up to a point, but quickly runs up against problems of assuming too much even before you start thinking about beings other than humans).

Curious Inquiry and Rigorous Training

This resonates with something I've been thinking about lately. Despite getting high grades, graduating highschool with an IB diploma, and going most of the way through a PhD, I was actually kind of bad at school in several ways, and one of those was that I was trying to actually learn the stuff I studied. Like a fool, you might say, I failed to realize school was a system to be gamed and tried to actually learn everything I was asked to learn for real. This was exhausting, and I dropped out of the PhD because of burnout over this as much as any other reason, having finally crossed the threshold where my intelligence couldn't beat the system without gaming it.

I find learning outside school quite different. Mostly memorizing things doesn't matter and curiosity and ability to do things matter way more. Remembering stuff helps you be fast, but natural spaced repetition of stuff you actually use often works well enough. It's a lot more fun and I'm better at it.

G Gordon Worley III's Shortform

More surprised than perhaps I should be that people take up tags right away after creating them. I created the IFS tag just a few days ago after noticing it didn't exist but wanted to link it and I added the first ~5 posts that came up if I searched for "internal family systems". It now has quite a few more posts tagged with it that I didn't add. Super cool to see the system working in real time!

Load More