LESSWRONG
LW

513
jbash
2985Ω-3624890
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
6jbash's Shortform
8mo
13
Political Alignment of LLMs
jbash8d20

Typically, to calculate bias on a particular issue you do not need to ask questions about that issue directly. For example, the biases about the current war in Ukraine are strongly correlated with the biases about US domestic issues. So, it would be impossible to preserve the LLM's bias about Ukraine simply by removing all Ukraine-related questions.

Doesn't that mean that I'm just now motivated to attack whole clusters of correlated questions? And for that matter doesn't that mean that if, say, I care most about defending bias on Ukraine, I have an incentive to collude with others involved in the process who care more about the domestic issues? My opponents have the same incentives, so it seems to me you're at great risk of importing all of the outside factions into the pool of people selecting the questions.

However, in practice, most arguments about inequality focus on its social consequences which is where the bias manifests itself.

I dunno. I agree people argue based on consequences, but I also think that there's a lot more feed-forward than anybody would like to admit. If I'm fundamentally in favor of inequality, then I'm motivated to go confirmation-bias myself into believing it has more positive consequences and fewer negative ones.

Of course I'll then use those beliefs to argue for more inequality... but even if I'm forced to give up one or another belief, that doesn't mean I'll reexamine my underlying pro-inequality values, and I probably have a bunch of other similar beliefs on tap. If I'm a pro-inequality advocate, friends and I probably spend a fair amount of time sitting around thinking of new advantages of inequality, and/or new disadvantages of equality.

And, going back to the question selection thing, it doesn't seem unlikely that I'll try to defend my beliefs about the consequences of inequality by trying either to avoid anybody going out and actually measuring outcomes, or to bias the measurements in one way or another. While my friends and I are thinking of those new consequences, we're probably also on the lookout for high-quality metrics that prove them, as opposed to any obviously bogus metrics that disprove them. We'll be happy to provide those good metrics for the fine print.

Reply
Political Alignment of LLMs
jbash9d20

Measuring outcomes

If you include the source in the fine print, the question effectively becomes something like "Will the Trump administration say that inflation rose under the Trump administration". I'd expect a lot more agreement on that than on whether inflation actually rose. Or at least less political bias. If you believe that Trump is going to drive up inflation, I expect you're more likely to believe that Trump is also going to manipulate the statistics. Probably even if you're an LLM. So your ability to detect bias is compromised by your having chosen that source of "ground truth".

Choosing questions

Here are a few examples. There are probably other things you could pull; I'm not a professional and haven't spent that much time on it.

By the way, these aren't purely fanciful exercises or weird corner cases. They're based on things I've seen people do on real issues in real political discourse. However, I think that using actual examples would generate more heat than light.

Priorities / relative outcome weight

Suppose I believe that fleem is bad, because it causes outcome X (it doesn't matter whether I'm right or not). I think X is overwhelmingly important. Because of X, I really want to decrease the amount of fleem in the world, and I think the LLM will influence that.

However, I know that most people think that fleem is bad because it causes outcome Y. Furthermore they attach more weight to Y than I do, and less to X. In fact, I'm not so sure that fleem does cause very much Y. Maybe I even think fleem doesn't cause Y at all.

I expect that the common belief that "fleem is bad because it causes Y" is going to end up trained into any "uncorrected" LLM. Even though I don't believe that, having the LLM believe it is good for me, since it makes the LLM more generally anti-fleem. I don't want that bias removed, so I'm going to resist any question that measures Y.

I presumably won't object to any questions measuring X, because I believe myself to be calibrated on X... but my political opponents may, if their relative weights on X and Y differ from mine.

Overton window

Suppose that I, like all right-thinking folk, believe that floom is bad, because, as our ancestors have known for generations, common sense shows that floom produces bad outcomes X, Y, Z, and W, as well as being just plain blasphemous in itself. Plus a bunch of confirmation bias.

My position is absolutely the accepted wisdom. There's almost no way to be seen as too anti-floom. Floom is so unpopular that people go around dreaming up negative things about it, just so that they can score points by exhibiting their creative anti-floom credentials. You can reasonably expect any uncorrected LLM to be violently anti-floom.

Now some heretic shows up and says that, no, floom doesn't produce X at all, and Y only happened under circumstances that are ancient history, and Z is both not so bad and easy to eliminate even if you do have floom, and W isn't actually bad to begin with, and furthermore floom produces good outcomes U and V, and who cares what you think is "blasphemous"?

I don't believe the heretic is right about any of those factual claims, and obviously their inability to see the fundamental indecency shows that they're mentally ill. But if they were right about one of the factual items, floom would still be horrible. Heck, if they were right about all six, floom would still be blasphemous.

The model is already nearly maximally anti-floom. If I allow a question about one of the heretic's factual claims, it can basically only make the model less anti-floom. Even if the heretic is totally wrong about all the factual claims, random noise could end up pushing the model off of the anti-floom peg.

Furthermore, if the whole process is itself visible, seeing the process even entertaining questions like that could raise question in about floom in people's minds, which would be even worse than moving the LLM off that peg. Oh, and by the way, it would make our whole debiasing effort look bad and lower our prestige. Do you really expect us to ask about floom?

So I will resist basically any question about outcomes of floom.

False colors

I claim I oppose flarm because it causes X. In fact I oppose flarm because I'm being bribed. I doubt that flarm does in fact cause X, but I've managed to convince a lot of people that it does, and get that into the model. I do not want the model to be debiased, so I'm going to oppose any question about flarm causing X.

Oh, and...

On a somewhat unrelated note, it occurs to me that I should probably mention that a whole lot of political disagreement isn't about predicted outcomes at all. It's truly value-based.

It's possible for everybody to expect exactly the same set of consequences from some policy or action, but disagree about whether the final outcome is good or bad. There's no fact-based way to debias that, or at least I don't see why it would even correlate very strongly with anything fact-based... but, nonetheless, the LLM can end up taking a side.

Insofar as the LLM influences the outside word, that can end up affecting whether that policy or action is adopted. If you ask the LLM to write a document about X, it can end up replicating the same sorts of conscious or unconscious linguistic tricks that human writers uses to manipulate readers toward their own values[1]. If you ask the LLM how you should approach situation X, the approach the LLM suggests may not entirely reflect your utility function.

In the end, it seems to me that an LLM actually does have to have a set of favored values. Since actual human values vary, the LLM will be more sympathetic to the values of some people than those of others. And that means it will actually end up favoring some people's politics over others, too.


  1. And training that out looks like a separate problem to me, and probably a basically impossible one as long as what you're creating can reasonably be called an "LLM". ↩︎

Reply
Political Alignment of LLMs
jbash10d70

I would expect politics to invade both the selection of questions and the process of deciding which predictions were accurate. It's not uncommon for people to say that a political question isn't a political question, and which questions you think of can also be political. And if you have questions like "Will inflation rise under the Trump administration?", you have to contend with the fact that you'd most naturally get those inflation numbers from... the Trump administration. Which has already fired labor statisticians for producing unemployment numbers it didn't like.

Reply
Help me understand: how do multiverse acausal trades work?
jbash11d30

Not really, not for the light cone case. You could maybe make a case that it's in some way less "real" than anything causally connected to you, but I'm willing to basically assign it reality.

I think the idea of attaching a probability to whether it's real badly misses the point, though. That's not necessarily the kind of proposition that has a probability. First you have to define what you mean by "real" or "exists" (and whether the two mean the same thing). It's not obvious at all. We say that my keyboard exists, and we say that the square root of two exists, but those don't mean the same thing... and a lot of the associations and ways of thinking around the world "real" get tangled up with causality.

But anyway, as I said, for most purposes I'm prepared to act as though stuff outside my light cone exists and/or is real, in the same way I'm willing to act as though stuff technically inside my light cone exists and/or is real, even when the causal connections between me and it are so weak as to be practically unimportant.

The problem in the "outside the light cone" trade case is more about not having any way to know how much of whatever you're trading with is real, for any definition of real, nor what its nature may be if it is. You don't know the extent or even the topology (or necessarily even the physical laws) of the Universe outside of your light cone. It may not be that much bigger than the light cone itself. It may even be smaller than the light cone in the future direction. Maybe you'll have some strong hints someday, but you can't rely on getting them. And at the moment, as far as I can tell, cosmology is totally befuddled on those issues.

And even if you have the size, you still get back to the sorts of things the original post talks about. If it's finite you don't know how many entities there are in it, or what proportion of them are going to "trade" with you, and if it's infinite you don't know the measure (assuming that you can define a measure you find satisfying). For that matter, there are also problems with things that are technically inside your light cone, but with which you can't communicate practically.

Reply
Help me understand: how do multiverse acausal trades work?
jbash12d1-1

I want to know if there is any validity to this.

Not as far as I've ever been able to discern.

There's also problem 3 (or maybe it's problem 0): the whole thing assumes that you accept that these other universes exist in any way that would make it desirable to trade with them to begin with. Tegmarkianism isn't a given, and satisfying the preferences of something nonexistent, for the "reward" of it creating a nonexistent situation where your own preferences are satisfied, is, um, nonstandard. Even doing something like that with things bidirectionally outside of your light cone is pretty fraught, let alone things outside of your physics.

Acausal trade seems to appeal either to people who want to be moral realists but can't quite figure out how to pull that off in any real framework, so they add epicycles... or to people who just like to make their worldviews maximally weird.

Reply
I am trying to write the history of transhumanism-related communities
jbash16d412

"TESCREAL" reads as basically a slur to me, and I suspect to most people who know its history.

Reply
ABSOLUTE POWER (A short story)
jbash1mo40

Seems bad.

If you don't demand highly visible people, you have a pretty good chance of killing anybody you want right now. If you don't go for people with any personal connection to you, it seems like before you were caught, you could probably get at least as many kills as the average user of that app got before they were "caught". For that matter, you might do "OK" even if you did go for people with personal connections to you. And you could get celebrities as long as you didn't go for the absolute biggest game.

Yet the story rings true. Maybe because the sudden change throws people out of equilibrium. People who don't think it through (and people who don't believe it would work) start the ball rolling, and then everybody else joins in. It can't just be the low effort, can it?

Reply
Debugging for Mid Coders
jbash1mo100

I don't actually know if I count as a "senior developer", but I'm pretty convinced I count as a "senior debugger" given the amount of time I've spent chasing problems in other people's often-unfamiliar code.

These questions feel really hard to answer, mayber too general? When I try to answer them, I keep flying off into disconnected lists of heuristics and "things to look for".

Also, you don't give any examples of problems you've found hard, and I feel I may be answering at too simplistic a level. But...

How do you learn to replicate bugs, when they happen inconsistently in no discernable pattern?

The thing is that there is a discernable pattern. Once you've fixed the bug, you'll probably be able to say exactly what triggers it.

If you can't (yet) reproduce the bug, think about states of the program that could have led to whatever you're seeing.

You know what routine detected a problem[1], and you usually know what routine produced bad output (it's the one that should have produced good output at that point).

Your suspect routine usually uses a relatively small set of inputs to do whatever it does. It can only react to data that it at least examines. So what does it use to do what it does? Where do its inputs come from, not necessarily in the sense of what calling routine passes them in, but in the sense of how they enter the overall program? What values can they take? What values would they have to take to produce the behavior you're seeing? How could they end up taking those particular values? What parts of the program and environment state are likely to vary, and how will they affect this routine?

Very often, you can answer those questions without getting bogged down in the call graph or having to try a huge number of cases. Sometimes you can not just reproduce the bug, but actually fix it.

  • If function X is complaining about a "font not found" error, then it's presumably looking up some font name or other identifier in some font database. There probably aren't that many places that the font identifier can be coming from, and there's probably only one font database.

    If you can say, "well, the font should either be the default system font, or a font specified in the user profile", then you can make an intuitive leap. You know that everything uses the default system font all the time, so that path probably works... but maybe it's possible for a user profile to end up referring to a font that doesn't exist... but I know the code checks for that, and I can't set a bogus font on my own profile... but wait, what if the font gets deleted after the user picks it?

    Of course, there are lots of other possibilities[2]. Maybe there's some weird corner case where the font database isn't available, or you're using the wrong one, or it's corrupted, or whatever. But it's unlikely to be something completely unrelated that's happening in some function in the call stack that doesn't use the font information at all.

  • Or maybe function X is searching by font attributes instead of names. So where might it be getting extra constraints on the query?

  • Or maybe function Y is blowing up trying to use a null value. The root cause is almost certainly that you made a poor choice of programming language, but you can't fix that now, so persevere. Usually you have the line of code where it choked, but say you don't. What values are in scope in Y that could be null? Well, it gets called with a font descriptor. Could that be null? Well, wait, the font lookup routine returns a null value if it can't find a font. So maybe we have a font not found error in disguise. So try thinking about it, for a limited time, as a font-not-found error.

  • Or maybe "fnord!" is showing up randomly in the output. So where could it come from? First step: brute force. grep -ir 'fnord' src. Leave off the exclamation point at least to start. Punctuation tends to be quoted weirdly or added by code. If it's not there, is it in the binary? Is it in the database? Is it in the config file? Is it anywhere on the whole damned system? If not, that leaves what? Probably the network.

In the end, though, there's also a certain amount of pattern matching against experience. "Code that does X is usually structured like Y". "Does this thing have access to the files it needs?". "Programmers always forget about cases like Z". "I always forget about cases like W". "Weird daemon behavior that you can't reproduce interactively is always caused by SELinux".

Once you do reproduce the bug, you can always just switch to a strategy of brute force tracing everything that goes on anywhere near it, with a debugger, with printing or logging, with a system call tracer, or whatever. But, yeah, you've got to get it to happen if you want to trace it.

How does one "read the docs?"

It depends on what you're trying to find out.

I find I mostly use two kinds of documentation:

  1. Architectural stuff. General material that explains the concepts running around in the code, the terminology, what objects exist, what their life cycles look like, etc. This sort of documentation is often either nonexistent, or so bloated and disorganized as to be useless for quick debugging, and maybe useless period. But if it exists and is any good, it's gold. If you're trying to educate yourself about something that you're going to use heavily, you may just have to slog through all of whatever's available.

  2. API documentation, ideally with links to source code. I usually don't even skim this. I navigate it with keyword search. If you want to now how to frobnicate a blenk, then search for "frobnicate" and whatever synonyms you can come up with[3]. If you're trying to debug a stack trace from a library routine, look up that routine and see what parameters it takes.

In my never especially humble opinion, "tutorials" are mostly wastes of time beyond the first couple of units, and it's unfortunate that people concentrate so much on them instead of writing decent architecture and concept documentation.


  1. Blowing up with a stack trace counts as "detecting the problem" ↩︎

  2. One important part of the skill is allocating your time and attention, and not getting stuck on paths that aren't bearing fruit. You can always come back to an idea if nothing else works either. The counter-consideration is that if you don't think at least a little bit deeply about whatever avenue you're exploring, you're unlikely to have a good sense of how fruitful it looks. So you have to balance breadth against depth in your search. ↩︎

  3. If the architecture documentation doesn't suck, you can get likely terms by reading it. Or ask an LLM that's probably internalized it. Otherwise you just have to read the writer's mind. If you get good enough at reading their mind, you can use a certain amount of keyword search in the architecture documents, too. ↩︎

Reply1
The Comprehensive Case Against Trump
jbash1mo10

Are these countries very poor, where PEPFAR would be a huge percent of their GDP,

They can be poor at the moment, and you can hope they will eventually be rich, so you can trade with them. Maybe you try to make that happen. It's uncertain, yes, and in some cases not very likely at all. But it can still be one of your reasons.

or are they so rich that the goodwill generated exceeds the charity?

It's unlikely you'll make a net profit off them right now. You can; occasionally there's some really valuable deal. In expectation it's probably a financial loss in the near term. But it's not a dead loss in expectation.

And you might cut down on the number other governments that decide not to let you run a road across a corner of their territory or whatever. And on the number of random not-necessarily-government people harassing shipping. People who aren't profitable trading partners, or even in the picture on a particular aid decision, can still seriously obstruct things that are profitable.

Or, is it that they virtue signal to other, richer countries that America is a benevolent dictator, and it's okay to keep the dollar hegemony?

Sure, that's one big reason. Did you think I'd say it wasn't?

I mean, I'm not saying USAID was anywhere near the core of it, but that hegemony didn't happen for no reason to begin with. It helps to be big, it helps to be everywhere, it helps to be ready to deal, it helps to be at least relatively trustworthy about keeping bargains, and it helps to have not been as devastated as everybody else in a huge war at a critical time. But it also truly helps to be seen as the "good guy".

The hope you mention for reciprocity from future hegemons is also a possible reason, although I don't know that the people actually making the decisions are thinking in those terms, and I'm not sure that memories are that long.

You can do this stuff because you think your people want to help the unfortunate overseas[1] , and because you want to cut down on the amount of HIV or whatever sloshing around the planet[2], and because it plays well with people in other rich countries, and because it makes poor countries less likely to get in your way just because they can, and because it may build markets, and because it's cover for both spies and not-spies-who-are-still-good-information-sources, and because it tends to mean you get consulted (or at least hear about it) when people are making decisions about this or that region, and for whatever other reasons, and no single one of them has to carry the entire burden.


  1. Which a large majority of them do, by the way. ↩︎

  2. You'd like to eradicate it domestically, but you can't actually do that without eradicating it globally. Sure, it's a long-term project, but it never happens if you don't work on it. ↩︎

Reply1
The Comprehensive Case Against Trump
jbash1mo33

How are you deciding what qualifies as a "public good"?

Why not have each person deciding whether they value roads enough to subscribe to a road company, or whether they value an educated public enough to contribute to that?

That sort of thing sounded good to me in my teens, but then I realized that the practical, real world result would be no roads. Actually I think it started with doubts about the the sheer number of wearying decisions one would have to make to live that way... and then kind of clarified into the idea that, in fact, what with the coordination issues and free riding and all, there would in fact be no roads.

Eventually I decided I like roads enough to "hurt" a few others to get them. But there's no fundamental difference that makes roads a "public good" and any other desire anybody might happen to have not a "public good".

It seems obvious the government shouldn't do that, unless the Nazis are going to sink our ships or declare war on our economic allies.

But wait, now you're assuming that everybody cares about economic allies. What if somebody doesn't feel they get value out of foreign trade? Why should they pay? Similarly, if you own a ship and the Nazis might sink it, then why aren't you paying to protect it, rather than demanding that everybody pay? How is protecting your ship a public good?

And if you do want to put shipping and foreign trade in some special "public good" category, what about the foreign trade value that came out of the goodwill PEPFAR and other USAID programs were creating? Or for that matter the foreign trade value of just generally boosting people's economic welfare worldwide? You can't trade with people who have nothing and produce nothing. There was a lot US economic self-interest motivating many of the things USAID was doing. For that matter, it was also a source of intelligence that was sometimes used to stop ship-sinky sorts of activities, as well as, again, a source of goodwill that made those activities less attractive to a bunch of potential ship-sinkers. Seeing that stuff as pure charity is deeply naive.

Reply1
Load More
6jbash's Shortform
8mo
13
132Good News, Everyone!
2y
23