Shortform Content

What's a good technical introduction to Decision Theory and Game Theory for alignment researchers? I'm guessing standard undergrad textbooks don't include, say, content about logical decision theory. I've mostly been reading posts on LW but as with most stuff here they feel more like self-contained blog posts (rather than textbooks that build on top of a common context) so I was wondering if there was anything like a canonical resource providing a unified technical / math-y perspective on the whole subject.

The MIRI Research Guide recommends An Introduction to Decision Theory and Game Theory: An Introduction. I have read neither and am simply relaying the recommendation.

Reading https://www.lesswrong.com/posts/nwJCzszw8gGjPTihM/i-still-think-it-s-very-unlikely-we-re-observing-alien and pondering the Bigfoot thing.

On the one hand, We Have Cameras Everywhere(TM).

On the other hand -- pick any area of the pacific northwest and look at a map of where the permanent roads are. Pull it up side by side with a map of an area that you're familiar with. Zoom in on both, to a magnification you'd consider reasonable for imagining things at walking-around scale. Pan around on the PNW map and try to find a permanent road. It'll take a min... (read more)

Consider two claims:

  • Any system can be modeled as maximizing some utility function, therefore utility maximization is not a very useful model
  • Corrigibility is possible, but utility maximization is incompatible with corrigibility, therefore we need some non-utility-maximizer kind of agent to achieve corrigibility

These two claims should probably not both be true! If any system can be modeled as maximizing a utility function, and it is possible to build a corrigible system, then naively the corrigible system can be modeled as maximizing a utility function.

I exp... (read more)

FWIW I endorse the second claim when the utility function depends exclusively on the state of the world in the distant future, whereas I endorse the first claim when the utility function can depend on anything whatsoever (e.g. what actions I’m taking right this second). (details)

I wish we had different terms for those two things. That might help with any alleged yay/boo reasoning.

(When Eliezer talks about utility functions, he seems to assume that it depends exclusively on the state of the world in the distant future.)

4Vladimir_Nesov12h
A utility function represents preference elicited in a large collection of situations, each a separate choice between events that happens with incomplete information, as an event is not a particular point. This preference needs to be consistent across different situations to be representable by expected utility of a single utility function. Once formulated, a utility function can be applied to a single choice/situation, such as a choice of a policy. But a system that only ever makes a single choice is not a natural fit for expected utility frame, and that's the kind of system that usually appears in "any system can be modeled as maximizing some utility function". So it's not enough to maximize something once, or in a narrow collection of situations, the situations the system is hypothetically exposed to need to be about as diverse as choices between any pair of events, with some of the events very large, corresponding to unreasonably incomplete information, all drawn across the same probability space. One place this mismatch of frames happens is with updateless decision theory. An updateless decision is a choice of a single policy, once and for all, so there is no reason for it to be guided by expected utility [https://www.lesswrong.com/posts/XYDsYSbBjqgPAgcoQ/why-the-focus-on-expected-utility-maximisers?commentId=a5tn6B8iKdta6zGFu], even though it could be. The utility function for the updateless choice of policy would then need to be obtained elsewhere, in a setting that has all these situations with separate (rather than all enacting a single policy) and mutually coherent choices under uncertainty. But once an updateless policy is settled (by a policy-level decision), actions implied by it (rather than action-level decisions in expected utility frame) no longer need to be coherent. Not being coherent, they are not representable by an action-level utility function. So by embracing updatelessness, we lose the setting that would elicit utility if the actions were
3JNS15h
Completely off the cuff take: I don't think claim 1 is wrong, but it does clash with claim 2. That means any system that has to be corrigible cannot be a system that maximizes a simple utility function (1 dimension), or put another way "whatever utility function is maximizes must be along multiple dimensions". Which seems to be pretty much what humans do, we have really complex utility functions, and everything seems to be ever changing and we have some control over it ourselves (and sometimes that goes wrong and people end up maxing out a singular dimension at the cost of everything else). Note to self: Think more about this and if possible write up something more coherent and explanatory.

A Thousand Narratives. Theory of Memetic Evolution. Part 2/20
A new way of doing the same thing

"Is an ant colony an organism, or is an organism a colony?" 
- Mark A. Changizi

As of now, there are two kinds of evolution: genetic evolution and memetic evolution. The first one is your usual evolution concerned with "change in the heritable characteristics of biological populations over successive generations", responsible for all the biological diversity that we know, and happening on the scale of at least hundreds of years. Memetic evolution, strictly spe... (read more)

A small dialogue originally meant for Dreaming of Utility, on the a priori origins of causal articulation in physical systems. I can't find a way to properly explain the intuitive notion that... to objectively secure a subjective goal is like closing a set in the Zariski topology, or generating an ideal from a nice cluster of elements of a ring: you get a bunch of weird and unboundedly-exploitable stuff, because that's just what the degrees of freedom your subjective goal requires give rise to. 

(Alice) How do you protect a physical thing from the outs... (read more)

Someone just told me that the solution to conflicting experiments is more experiments. Taken literally this is wrong: more experiments just means more conflict. What we need are fewer experiments. We need to get rid of the bad experiments.

Why expect that future experiments will be better? Maybe if the experimenters read the past experiments, they could learn from them. Well, maybe, but maybe if you read the experiments today, you could figure out which ones are bad today. If you don't read the experiments today and don't bother to judge which ones are better, what incentive is there for future experimenters to make better experiments, rather than accumulating conflict?

Reasonably we need both, but most of all we need some way to figure out what happened in the situation where we have conflicting experiments, so as to be able to say "these results are invalid because XXX".

Probably more of an adversarial process, where experiments and their results must be replicated*. Which means experiments must be documented way more detailed, and also data has to be much more clear and especially the steps that happen in clean up etc.

Personally I think science is in crisis, people are incentivized to write lots of papers, publish resul... (read more)

In response to / inspired by this SSC post:

I was originally going to comment something about "how do I balance this with the need to filter for niche nerds who are like me?", but then I remembered that the post is actually literally about dunks/insults on Twitter. o_0

This, in meta- and object-level ways, got to a core problem I have: I want to do smart and nice things with smart and nice people, yet these (especially the social stuff) requires me to be so careful + actually have anything like a self-filter. And even trying to practice/exercise that basic s... (read more)

Sometimes I have an internal desire different to do something different than what I think should be done (for example, I might desire to play a game while also thinking the better choice is to read). I've been experimenting with using randomness to mediate this. I keep a D20 with me, give each side of the dispute some odds proportional to the strength of its resolve, and then roll the die.

In theory, this means neither side will overpower the other, and even a small resolve still has a chance. I'm not sure how useful this is, but it's fun, and can sort of g... (read more)

"EV is measure times value" is a sufficiently load-bearing part of my worldview that if measure and value were correlated or at least one was a function of the other I would be very distressed.

Like in a sense, is John threatening to second-guess hundreds of years of consensus on is-ought?

1Noosphere897d
I'm not sure what measure is referring to here.
1Quinn7d
probability density

A Thousand Narratives. Theory of Memetic Evolution. Part 1/20. Intro

The ultimate goal of this line of research is to gain a better understanding of how human value system operates. The problem I see regarding current approaches to studying values is that we cannot study {values/desires/preferences} in isolation from the rest of cognitive mechanisms, cause according to latest theories values are just a part of a broader system governing behaviour in general. With that you have to have a decent model of human behaviour first to then be able to explain value ... (read more)

I don't see how we could ever get superhuman intelligence out of GPT-3. My understanding is that the goal of GPT neural nets is to predict the next token based on web text written by humans. GPT-N as N -> inf will be perfect at creating text that could be written by the average internet user.

But the average internet user isn't that smart! Let's say there's some text on the internet that reads, "The simplest method to break the light speed barrier is..." The most likely continuation of that text will not be an actual method to break the light speed barrier! It'll probably be some technobabble from a sci-fi story. So that's what we'll get from GPT-N!

Showing 3 of 7 replies (Click to show all)
0the gears to ascension1y
The problem isn't that people are trying to parent AIs into not being assholes via social justice knowledge, the problem is that the people receiving the social justice knowledge are treating it as an attempt to avoid being canceled when they need to be seeking out ways to turn it into constructive training data. social justice knowledge is actually very relevant here. align the training data, (mostly) align the ai. worries about quality of generalization are very valid and the post about reward model hacking is a good introduction to why reinforcement learning is a bad idea. however current unsupervised learning only desires to output truth. ensuring that the training data represents a convergence process from mistakes towards true social justice seems like a very promising perspective to me and not one to trivially dismiss. ultimately AI safety is most centrally a parenting, psychology, and vibes problem with some additional constraints due to issues with model stability, reflection, sanity, "ai psychiatry". also AI is not plateauing
1frontier641y
My understanding is that it's possible there's a neural net along the path of GPT-1 -> N that plateaus at perfectly predicting the next token of text written by a human that stops way short of having to model the entire Earth. And that would basically be a human internet poster right? If you create one of those, then training it with more text, more space, and more compute won't make a neural net that models the earth. It'll just make that same neural net that works perfectly on its own with a bunch of extra wasted space. I'm not too sure my understanding is correct though.

(I realize this is an old thread, but I thought the conversation was interesting.  If responding to such an old thread is norm-breaking, I apologize.)

My understanding is that it's possible there's a neural net along the path of GPT-1 -> N that plateaus at perfectly predicting the next token of text written by a human that stops way short of having to model the entire Earth. And that would basically be a human internet poster right?

(I'm completely ignoring the discussion about whether GPT-N needs to model the entire Earth in order to predict text. &... (read more)

There's something very creepy to me about the part of research consent forms where it says "my participation was entirely voluntary."

  1. Do they really think an involuntary participant wouldn't sign that? If they understand that they would, what purpose could this possibly serve, other than, as is commonly the purpose of contracts; absolving themselves of blame and moving blame to the participant? Which would be downright monstrous. Probably they just aren't fucking consequentialists, but this is all they end up doing.
  2. This is a minor thing, but it adds an addi
... (read more)
Showing 3 of 5 replies (Click to show all)

If someone explicitely writes into their consent forms "my participation was entirely voluntary" and the participation isn't voluntary it might be easier to attack the person running the trial later. 

1rodeo_flagellum3d
  Important to remember and stand by the Nuremberg Code [https://history.nih.gov/download/attachments/1016866/nuremberg.pdf?version=1&modificationDate=1589152811742&api=v2] in these contexts. 
3frontier643d
The reason is to prevent the voluntary participant from later claiming that their participation was involuntary and telling that to the IRB. 'Well if your participation was involuntary, why did you sign this document?' It kind of limits the arguments someone could make attacking the ethics of the study. The attacker would have to allege coercion on the order of people being forced to lie on forms under threat.

Accurately assessing sex-related characteristics saves lives. Can we make it fair to all humans, women, men, trans and inter folks? A nerdy idea.

Sex-related characteristics are medically relevant; accurately assessing them saves lives.
But neither assigned sex nor gender identity alone properly capture them. Is anyone else interested in designing a characteristic string instead, so all humans, esp. all women and gender diverse folks, get proper medical care?

This idea started yesterday, when I had severe abdominal pain, and started googling.
Eventually, I rea... (read more)

The standard way to run medical trial is to focus on people that are "normal". That usually means that people in clinical trials don't take other drugs that have side effects. From a clinical trial standpoint taking hormones is taking a drug with a lot of side effects that relatively few people in the population take. 

The average clinical trial does not recruit an amount of trans participants to measure effects on those and running clinical trials is already expensive enough the way it is currently. That's extra true if you want to distinguish between... (read more)

The following is a conversation between myself in 2022, and a newer version of me earlier this year.

On the Nature of Intelligence and its "True Name":

2022 Me:  This has become less obvious to me as I’ve tried to gain a better understanding of what general intelligence is. Until recently, I always made the assumption that intelligence and agency were the same thing. But General Intelligence, or G, might not be agentic. Agents that behave like RLs may only be narrow forms of intelligence, without generalizability. G might be something closer to a simula... (read more)

Does anyone here know of (or would be willing to offer) funding for creating experimental visualization tools?

I’ve been working on a program which I think has a lot of potential, but it’s the sort of thing where I expect it to be most powerful in the context of “accidental” discoveries made while playing with it (see e.g. early use of the microscope, etc.).

I’d also post in the “welcome” thread.

My take on complex systems theory is that it seems to be the kind of theory that many arguments proposed in favor of would still give the same predictions until it is blatantly obvious that we can in fact understand the relevant system. Results like chaotic relationships, or stochastic-without-mean relationships seem definitive arguments in favor of the science, though these are rarely posed about neural networks.

Merely pointing out that we don’t understand something, that there seems to be a lot going on, or that there exist nonlinear interactions imo isn... (read more)

I have downvoted my comment here, because I disagree with past me. Complex systems theory seems pretty cool from where I stand now, and I think past me has a few confusions about what complex systems theory even is.

Noticing I've been operating under a bias where I notice existential risk precursors pretty easily (EG, biotech, advances in computing hardware), but I notice no precursors of existential safety. To me it is as if technologies that tend to do more good than harm, or at least, would improve our odds by their introduction, social or otherwise, do not exist. That can't be right, surely?...

When I think about what they might be... I find only cultural technologies, or political conditions: the strength of global governance, the clarity of global discourses, per... (read more)

Showing 3 of 9 replies (Click to show all)
1Noosphere893d
It's essentially a frame that views things in a negative light, or equivalently a frame that views a certain issue as by default negative unless action is taken. For example, climate change can be viewed in the negative, which is that we have to solve the problem or we all die, or as a positive frame where we can solve the problem by green tech
2mako yass3d
I was hoping to understand why people who are concerned about the climate ignore greentech/srm. One effect, is that people who want to raise awareness about the severity of an issue have an incentive to avoid acknowledging solutions to it, because that diminishes its severity. But this is an egregore-level phenomenon, there is no individual negative cognitive disposition that's driving that phenomenon as far as I can tell. Mostly, in the case of climate, it seems to be driven by a craving for belonging in a political scene.

The point I was trying to make is that we click on and read negative news, and this skews our perceptions of what's happening, and critically the negativity bias operates regardless of the actual reality of the problem, that is it doesn't distinguish between the things that are very bad, just merely bad but solvable, and not bad at all.

In essence, I'm positing a selection effect, where we keep hearing more about the bad things, and hear less or none about the good things, so we are biased to believe that our world is more negative than it actually is.

And t... (read more)

(I promised I'd publish this last night no matter what state it was in, and then didn't get very far before the deadline. I will go back and edit and improve it later.)

 

I feel like I keep, over and over, hearing a complaint from people who get most of their information about college admissions from WhatsApp groups or their parents’ friends or a certain extraordinarily pervasive subreddit (you all know what I’m talking about). Something like “College admissions is ridiculous! Look at this person, who was top of his math class and took 10 AP classes and... (read more)

I used to have a model of breathing that went something like this: when breathing in, the lungs somehow get bigger, creating lower air pressure inside the lungs causing air to flow in. Then when breathing out the lungs get smaller, creating higher air pressure inside the lungs and causing air to flow out. How do the lungs get bigger and smaller? Eventually I learned that there's a muscle called the diaphragm that is attached to the bottom of the lungs (??) that pulls or pushes the lungs. If I keep my nose plugged but my mouth open, the air will travel thro... (read more)

This is the kind of question that ChatGPT can answer really well. 

1O O3d
I did some quick experimentation. I found that if my tongue doesn’t block my mouth I can only breathe through my mouth and if it does I can only breathe through my nose. I then didn’t block my mouth airway with my tongue and blocked my mouth with my hand. It seems air doesn’t go through my nose in that case unless I breathe in really hard, in which case I hear and feel something opening in the back of my nose. I’m guessing there is another valve in the nose. If you had allergies growing up you’d already know all of this..
2riceissa3d
Huh, this isn't what happens when I try it. If I keep my tongue out or at the base of my mouth, I can still definitely choose whether to make the air go through my nose or mouth. If I try to block my mouth with my tongue, that does obstruct the airflow through my mouth but I can still breathe mostly okay (even if I plug my nose).

I have a heuristic to evaluate topics to potentially write about where I especially look for topics to write about that usually people are averse to writing about. It seems that topics that score high according to this heuristic might be good to write about as they can yield content with high utility compared to what is available, simply because other content of this kind (and especially good content of this kind) is rare.

Somebody told me that they read some of my writing and liked it. They said that they liked how honest it was. Perhaps writing about topi... (read more)

Load More