# All of Alex_Altair's Comments + Replies

Is there a bug around resizing images? Previously I've found that my image size choice is ignored unless the image has a caption. But for gifs, it seems to ignore it even if there is a caption, instead rendering the image at the full width of the article.

The image must be hosted!

This is no longer true, right?

(Also, I came here looking for a list of supported image types; I'm trying to insert an SVG, but it's just getting ignored.)

2Raemon2d
I think most raster image format should work fine (I'm not surprised that SVGs don't work, but, like, you can just take a screenshot of it and insert it or something))

Gotcha, that makes sense! Agreed that an announcement tag is a good solution.

4Raemon3d
I created this: https://www.lesswrong.com/tag/lw-team-announcements [https://www.lesswrong.com/tag/lw-team-announcements] I'm not 100% sure how well we'll stick to it but you can subscribe to it.

Meta-comment; It might be a good idea to create an official Lightcone-or-whatever LW account that you can publish these kinds of posts from. Then, someone could e.g. subscribe to that user, and get notified of all the official announcement-type posts, without having to subscribe to the personal account of Ruby-or-Ray-etc.

8Raemon3d
Tagging feels more like the right abstraction here (you can subscribe to tags). There's a Site Meta tag. We could make a LW Team Announcement tag or something. We have a general policy against organizational-accounts in most cases (since they make it less clear who you're talking to, and sort of shift culture into a more bureaucratic direction). So, I'd avoid a solution where we ourselves did that. :P
2Raemon3d
Fixed.

theoretical progress has been considerably faster than expected, while crossing the theory-practice gap has been mildly slower than expected. (Note that “theory progressing faster than expected, practice slower” is a potential red flag for theory coming decoupled from reality

I appreciate you flagging this. I read the former sentence and my immediate next thought was the heuristic in the parenthetical sentence.

Totally baseless conjecture that I have not thought about for very long; chaos is identical to Turing completeness. All dynamical systems that demonstrate chaotic behavior are Turing complete (or at least implement an undecidable procedure).

Has anyone heard of an established connection here?

8gwern8d
Might look at Wolfram's work. One of the major themes of his CA classification project is that chaotic (in some sense, possibly not the rigorous ergodic dynamics definition) rulesets are not Turing-complete; only CAs which are in an intermediate region of complexity/simplicity have ever been shown to be TC.

FWIW I cannot find your podcast by searching in the app "Pocket Casts" (though I can on spotify).

1Caspar429d
In general it seems that currently the podcast can only be found on Spotify.

If anyone's interested in doing an even less formal version of this, I think it would be really useful for me to have semi-regular chats with other people in the alignment space. This could be anything from "you mentor me for an hour a week at the Lightcone office" to "we chat for 15 minutes on zoom every few weeks". I feel reasonably connected to the community, but I think I would strongly benefit from more two-way real-time interaction.

(More info about me: I'm currently doing full-time independent alignment research, but just on my own, with no structure...

2Marius Hobbhahn12d
If it's easier for you, we can already facilitate that through M&M. Like we said, as long as both parties agree, you can do whatever makes sense for you :) But the program might make finding other people easier.

Heh, well, see the aforementioned

it's almost what "doing math" is for me

It also feels like you're asking something like, "what's the most important problem you are trying to solve by having visual perception?" It's kind of just how I navigate the world at all (atoms or math).

But let me take your question at face value and try to answer it.

I think the main answer is something like "semantics". So much of my experiential knowledge is encoded in this physical, 3D physics manner, and when I can match up a symbolic expression with a physical scenario, I get a w...

2AllAmericanBreakfast12d
The first math problem I created was spawned from a visualization on a rare occasion smoking marijuana. I started imagining cubes falling from the sky, and just observing this process. I got curious about the chance of one cube landing on top of another. This led to the problem: Given a square boundary of size B, place squares of side length S one at a time completely inside of the boundary at random locations. How many squares on average will you place before a pair of squares overlaps?

Right, so my question to you is, how do you do math?? (This is probably silly question, but I'd love to hear your humor-me answer.)

4James_Miller11d
Last time I did math was when teaching game theory two days ago. I put a game on the blackboard. I wrote down an inequality that determined when there would be a certain equilibrium. Then I used the rules of algebra to simplify the inequality. Then I discussed why the inequality ended up being that the discount rate had to be greater than some number rather than less than some number.

It sure would be awesome if Lightcone Infrastructure spun up a Mastodon instance for the extended rationalist/EA/AI safety communities.

2niplav1mo
There is schelling.pt [https://schelling.pt/], which originated from the tpot/tcot on twitter, which I've used for the last year and a half.

Hm, it seems pretty dependent on ontology to me – that's pretty much what the set of all states is, an ontology for how the world could be.

In case you missed it, LW 2.0 has feature support for creating sequences. If you hover over your username, the menu has a link to https://www.lesswrong.com/sequencesnew

Is this written against some hypothetical "static world" assumption

Basically exactly that, yeah. But that assumption exists both on a conscious level (in that many people don't consciously realize how much the universe has changed) and on a subconscious level, in that many ways the world currently is feel stable, even if you know they're not.

I'm psyched to have a podcast version! The narrator did a great job. I was wondering how they were going to handle several aspects of the post, and I liked how they did all of them.

Totally agree. Oliver & co. won tons of Bayes points off me.

Heh, I'm still skimming enough to catch this, but definitely not evaluating arguments.

I'm definitely still open to both changing my mind about the best use of terms and also updating the terminology in the sequence (although I suspect that will be quite a non-trivial amount of modified prose). And I think it's best if I don't actually think about it until after I publish another post.

I'd also be much more inclined to think harder about this discussion if there were more than two people involved.

My main goal here has always been "clearly explain the existin...

2So8res1mo
Cool cool. A summary of the claims that feel most important to me (for your convenience, and b/c I'm having fun): * K-complexity / "algorithmic entropy" is a bad concept that doesn't cleanly relate to physics!entropy or info!entropy. * In particular, the K-complexity of a state s is just the length of the shortest code for s, and this is bad because when s has multiple codes it should count as "simpler". (A state with five 3-bit codes is simpler than a state with one 2-bit code.) (Which is why symmetry makes a concept simpler despite not making its code shorter.) * If we correct our notion of "complexity" to take multiple codes into account, then we find that complexity of a state s (with respect to a coding scheme C) is just the info!cross-entropy H(s,C). Yay! Separately, some gripes: * the algorithmic information theory concept is knuckleheaded, and only approximates info!entropy if you squint really hard, and I'm annoyed about it * I suspect that a bunch of the annoying theorems in algorithmic information theory are annoying precisely because of all the squinting you have to do to pretend that K-complexity was a good idea And some pedagogical notes: * I'm all for descriptive accounts of who uses "entropy" for what, but it's kinda a subtle situation because: * info!entropy is a very general concept, * physics!entropy is an interesting special case of that concept (in the case where the state is a particular breed of physical macrostate), * algo!entropy is a derpy mistake that's sniffing glue in the corner, * algo!entropy is sitting right next to a heretofore unnamed concept that is another interesting special case of info!(cross-)entropy (in the case where the code is universal). (oh and there's a bonus subtlety that if you port algo!entropy to a situation where the coding schema has at most one code per state--which is emphatically not the case in algorithmic informatio

quantum mechanics famously provides the measure on phase-space that classical statistical mechanics took as axiomatic

I'd be interested in a citation of what you're referring to here!

The state-space (for particles) in statmech is the space of possible positions and momenta for all particles. The measure that's used is uniform over each coordinate of position and momentum, for each particle. This is pretty obvious and natural, but not forced on us, and: 1. You get different, incorrect predictions about thermodynamics (!) if you use a different measure. 2. The level of coarse graining is unknown, so every quantity of entropy has an extra "+ log(# microstates per unit measure)" which is an unknown additive constant. (I think this is separate from the relationship between bits and J/K, which is a multiplicative constant for entropy -- k_B -- and doesn't rely on QM afaik.) On the other hand, Liouville's theorem gives some pretty strong justification for using this measure, alleviating (1) somewhat: https://en.wikipedia.org/wiki/Liouville%27s_theorem_(Hamiltonian) [https://en.wikipedia.org/wiki/Liouville%27s_theorem_(Hamiltonian)] In quantum mechanics, you have discrete energy eigenstates (...in a bound system, there are technicalities here...) and you can define a microstate to be an energy eigenstate, which lets you just count things and not worry about measure. This solves both problems: 1. Counting microstates and taking the classical limit gives the "dx dp" (aka "dq dp") measure, ruling out any other measure. 2. It tells you how big your microstates are in phase space (the answer is related to Planck's constant, which you'll note has units of position * momentum). This section mostly talks about the question of coarse-graining, but you can see that "dx dp" is sort of put in by hand in the classical version: https://en.wikipedia.org/wiki/Entropy_(statistical_thermodynamics)#Counting_of_microstates [https://en.wikipedia.org/wiki/Entropy_(statistical_thermodynamics)#Counting_of_microstates] I wish I had a better citation but I'm not sure I do. In general it seems like (2) is talked about more in the l

Did you want your "abstract entropy" to encompass both of these?

Indeed I definitely do.

I would add a big fat disclaimer

There are a bunch of places where I think I flagged relevant things, and I'm curious if these seem like enough to you;

• The whole post is called "abstract entropy", which should tell you that it's at least a little different from any "standard" form of entropy
• The third example, "It helps us understand strategies for (and limits on) file compression", is implicitly about K-complexity
• This whole paragraph: "Many people reading this will have so
...
I initially interpreted "abstract entropy" as meaning statistical entropy as opposed to thermodynamic or stat-mech or information-theoretic entropy. I think very few people encounter the phrase "algorithmic entropy" enough for it to be salient to them, so most confusion about entropy in different domains is about statistical entropy in physics and info theory. (Maybe this is different for LW readers!) This was reinforced by the introduction because I took the mentions of file compression and assigning binary strings to states to be about (Shannon-style) coding theory, which uses statistical entropy heavily to talk about these same things and is a much bigger part of most CS textbooks/courses. (It uses phrases like "length of a codeword", "expected length of a code [under some distribution]", etc. and then has lots of theorems about statistical entropy being related to expected length of an optimal code.) After getting that pattern going, I had enough momentum to see "Solomonoff", think "sure, it's a probability distribution, presumably he's going to do something statistical-entropy-like with it", and completely missed the statements that you were going to be interpreting K complexity itself as a kind of entropy. I also missed the statement about random variables not being necessary. I suspect this would also happen to many other people who have encountered stat mech and/or information theory, and maybe even K complexity but not the phrase "algorithmic entropy", but I could be wrong. A disclaimer is probably not actually necessary, though, on reflection; I care a lot more about the "minimum average" qualifiers both being included in statistical-entropy contexts. I don't know exactly how to unify this with "algorithmic entropy" but I'll wait and see what you do :)

Just mulling over other names, I think "description length" is the one I like best so far. Then "entropy" would be defined as minimum average description length.

I like "description length". One wrinkle is that entropy isn't quite minimum average description length -- in general it's a lower bound on average description length. If you have a probability distribution that's (2/3, 1/3) over two things, but you assign fixed binary strings to each of the two, then you can't do better than 1 bit of average description length, but the entropy of the distribution is 0.92 bits. Or if your distribution is roughly (.1135, .1135, .7729) over three things, then you can't do better than 1.23 bits, but the entropy is 1 bit. You can only hit the entropy exactly when the probabilities are all powers of 2. (You can fix this a bit in the channel-coding context, where you're encoding sequences of things and don't have to assign fixed descriptions to individual things. In particular, you can assign descriptions to blocks of N things, which lets you get arbitrarily close as N -> infinity.)

That makes sense. In my post I'm saying that entropy is whatever binary string assignment you want, which does not depend on the probability distribution you're using to weight things. And then if you want the minimum average string length, it becomes in terms of the probability distribution.

Ah, I missed this on a first skim and only got it recently, so some of my comments are probably missing this context in important ways. Sorry, that's on me.

one of my personal spicy takes...

Omfg, I love hearing your spicy takes. (I think I remember you advocating hard tabs, and trinary logic.)

ə, pronounced "schwa", for 1/e

lug, pronounced /ləg/, for log base ə

nl for "negative logarithm"

XD XD guys I literally can't

Extremely pleased with this reception! I indeed feel pretty seen by it.

I think he suggested that this naming fits with something he wants to do with K complexity

I didn't mean something I'm doing, I meant that the field of K-complexity just straight-forwardly uses the word "entropy" to refer to it. Let me see if I can dig up some references.

K-complexity is apparently sometimes called "algorithmic entropy" (but not just "entropy", I don't think?) Wiktionary quotes Niels Henrik Gregersen: I think this might be the crux! Note the weird type mismatch: "the statistical entropy of an ensemble [...] the ensemble average of the algorithmic entropy of its members". So my story would be something like the following: 1. Many fields (thermodynamics, statistical mechanics, information theory, probability) use "entropy" to mean something equivalent to "the expectation of -log(p) for a distribution p". Let's call this "statistical entropy", but in practice people call it "entropy". 2. Algorithmic information theorists have an interestingly related but distinct concept, which they sometimes call "algorithmic entropy". Whoops, hang on a sec. Did you want your "abstract entropy" to encompass both of these? If so, I didn't realize that until now! That changes a lot, and I apologize sincerely if waiting for the K-complexity stuff would've dissipated a lot of the confusion. Things I think contributed to my confusion: (1) Your introduction only directly mentions / links to domain-specific types of entropy that are firmly under (type 1) "statistical entropy" (2) This intro post doesn't yet touch on (type 2) algorithmic entropy, and is instead a mix of type-1 and your abstract thing where description length and probability distribution are decoupled. (3) I suspect you were misled by the unpedagogical phrase "entropy of a macrostate" from statmech, and didn't realize that (as used in that field) the distribution involved is determined by the macrostate in a prescribed way (or is the macrostate). I would add a big fat disclaimer that this series is NOT just limited to type-1 entropy, and (unless you disagree with my taxonomy here) emphasize heavily that you're including type-2 entropy.

Part of what confuses me about your objection is that it seems like averages of things can usually be treated the same as the individual things. E.g. an average number of apples is a number of apples, and average height is a height ("Bob is taller than Alice" is treated the same as "men are taller than women"). The sky is blue, by which we mean that the average photon frequency is in the range defined as blue; we also just say "a blue photon".

A possible counter-example I can think of is temperature. Temperature is the average [something like] kinetic energ...

I think it's different because entropy is an expectation of a thing which depends on the probability distribution that you're using to weight things. Like, other things are maybe... A is the number of apples, sum of p×A is the expected number of apples under distribution p, sum of q×A is the expected number of apples under distribution q. But entropy is... -log(p) is a thing, and sum of p × -log(p) is the entropy. And the sum of q × -log(p) is... not entropy! (It's "cross-entropy")

(Let's not call it "probability" because that has too much baggage.)

This aside raises concerns for me, like it makes me worry that maybe we're more deeply not on the same page. It seems to me like the weighing is just straight-forward probability, and that it's important to call it that.

I think I was overzealous with this aside and regret it. I worry that the word "probability" has connotations that are too strong or are misleading for some use cases of abstract entropy. But this is definitely probability in the mathematical sense, yes. Maybe I wish mathematical "probability" had a name with weaker connotations.

One thing I'm not very confident about is how working scientists use the concept of "macrostate". If I had good resources for that I might change some of how the sequence is written, because I don't want to create any confusion for people who use this sequence to learn and then go on to work in a related field. (...That said, it's not like people aren't already confused. I kind of expect most working scientists to be confused about entropy outside their exact domain's use.)

I think it might be a bit of a mess, tbh. In probability theory, you have outcomes (individual possibilities), events (sets of possibilities), and distributions (assignments of probabilities to all possible outcomes). "microstate": outcome. "macrostate": sorta ambiguous between event and distribution. "entropy of an outcome": not a thing working scientists or mathematicians say, ever, as far as I know. "entropy of an event": not a thing either. "entropy of a distribution": that's a thing! "entropy of a macrostate": people say this, so they must mean a distribution when they are saying this phrase. I think you're within your rights to use "macrostate" in any reasonable way that you like. My beef is entirely about the type signature of "entropy" with regard to distributions and events/outcomes.

Here's another thing that might be adding to our confusion. It just so happens that in the particular system that is this universe, all states with the same total energy are equally likely. That's not true for most systems (which don't even have a concept of energy), and so it doesn't seem like a part of abstract entropy to me. So e.g. macrostates don't necessarily contain microstates of equal probability (which I think you've implied a couple times).

Honestly, I'm confused about this now. I thought I recalled that "macrostate" was only used for the "microcanonical ensemble" (fancy phrase for a uniform-over-all-microstates-with-same-(E,N,V) probability distribution), but in fact it's a little ambiguous. Wikipedia says which implies microcanonical ensemble (the other are parametrized by things other than (E, N, V) triples), but then later it talks about both the canonical and microcanonical ensemble. I think a lot of our confusion comes from way physicists equivocate between macrostates as a set of microstates (with the probability distribution) unspecified) and as a probability distribution. Wiki's "definition" is ambiguous: a particular (E, N, V) triple specifies both a set of microstates (with those values) and a distribution (uniform over that set). In contrast, the canonical ensemble is a probability distribution defined by a triple (T,N,V), with each microstate having probability proportional to exp(- E / kT) if it has particle number N and volume V, otherwise probability zero. I'm not sure what "a macrostate specified by (T,N,V)" should mean here: either the set of microstates with (N, V) (and any E), or the non-uniform distribution I just described. (By the way: note that when T is being used here, it doesn't mean the average energy, kinetic or otherwise. kT isn't the actual energy of anything, it's just the slope of the exponential decay of probability with respect to energy. A consequence of this definition is that the expected kinetic energy in some contexts is proportional to temperature, but this expectation is for a probability distribution over many microstates that may have more or less kinetic energy than that. Another consequence is that for large systems, the average kinetic energy of particles in the actual true microstate is very likely to be very close to (some multiple of) kT, but this is because of the law of large numbers and is not true for small systems. Note that there's two dif

I'm not quite sure what the cruxes of our disagreement are yet. So I'm going to write up some more of how I'm thinking about things, which I think might be relevant.

When we decide to model a system and assign its states entropy, there's a question of what set of states we're including. Often, we're modelling part of the real universe. The real universe is in only one state at any given time. But we're ignorant of a bunch of parts of it (and we're also ignorant about exactly what states it will evolve into over time). So to do some analysis, we decide on so...

I think the crux of our disagreement [edit: one of our disagreements] is whether the macrostate we're discussing can be chosen independently of the "uncertainty model" at all. When physicists talk about "the entropy of a macrostate", they always mean something of the form: * There are a bunch of p's that add up to 1. We want the sum of p × (-log p) over all p's. [EXPECTATION of -log p aka ENTROPY of the distribution] They never mean something of the form: * There are a bunch of p's that add up to 1. We want the sum of p × (-log p) over just some of the p's. [???] Or: * There are a bunch of p's that add up to 1. We want the sum of p × (-log p) over just some of the p's, divided by the sum of p over the same p's. [CONDITIONAL EXPECTATION of -log p given some event] Or: * There are a bunch of p's that add up to 1. We want the sum of (-log p) over just some of the p's, divided by the number of p's we included. [ARITHMETIC MEAN of -log p over some event] This also applies to information theorists talking about Shannon entropy. I think that's the basic crux here. This is perhaps confusing because "macrostate" is often claimed to have something to do with a subset of the microstates. So you might be forgiven for thinking "entropy of a macrostate" in statmech means: * For some arbitrary distribution p, consider a separately-chosen "macrostate" A (a set of outcomes). Compute the sum of p × (-log p) over every p whose corresponding outcome is in A, maybe divided by the total probability of A or something. But in fact this is not what is meant! Instead, "entropy of a macrostate" means the following: * For some "macrostate", whatever the hell that means, we construct a probability distribution p. Maybe that's the macrostate itself, maybe it's a distribution corresponding to the macrostate, usage varies. But the macrostate determines the distribution, either way. Compute the sum of p × (-log p) over every p. EDIT

The historical baggage is something that tripped me up, too. In an upcoming post I have a section about classical thermodynamic entropy, including an explanation of the weird units!

1mukashi1mo
That's great, I subscribed and looking forward to it!

Nice catches. I love that somebody double-checked all the binary strings. :)

I think it's also important for my definition of optimization (coming later), because individual microstates do deserve to be assigned a specific level of optimization.

That's a reasonable stance, but one of the main messages of the sequence is that we can start with the concept of individual states having entropy assigned to them, and derive everything else from there! This is especially relevant to the idea of using Kolmogorov complexity as entropy. Calling it "surprisal" or "information" has an information-theoretic connotation to it that I think doesn't apply in all contexts.

I'm fine with choosing some other name, but I think all of the different "entropies" (in stat mech, information theory, etc) refer to weighted averages over a set of states, whose probability-or-whatever adds up to 1. To me that suggests that this should also be true of the abstract version. So I stand by the claim that the negative logarithm of probability-or-whatever should have some different name, so that people don't get confused by the ([other thing], entropy) → (entropy, average entropy) terminology switch. I think "average entropy" is also (slightly) misleading because it suggests that the -log(p)'s of individual states are independent of the choice of which microstates are in your macrostate, which I think is maybe the root problem I have with footnote 17. (See new comment in that subthread)
2Alex_Altair1mo
I think it's also important for my definition of optimization (coming later), because individual microstates do deserve to be assigned a specific level of optimization.

Maybe a slightly better title to the post would be "Plans are prediction, not optimization targets"? I found the "plans are predictions" part of the post to be the most insightful, and the rewording also removes a "should".

2johnswentworth1mo

Loved this post. Both because I think this is a valuable set of reasoning heuristics, and because I read it in your voice, which made it feel something like a rationalist standup routine.

Should there be an "advice for new orgs" tag?

5Ben Pace2mo
For the record, my sense of the biggest single problem with new orgs is that they don't know that they should read Paul Graham's essays, and more importantly they don't know that they should watch the YC video lectures [https://www.youtube.com/watch?v=CBYhVcO4WgI]. I feel in my conversations like I just don't have a shared referent for what a 'functional organization' looks like, these people keep talking about hiring as though it's a good thing, about looking professional, and so on. No, top priority is small number of people, and getting the key thing done impressively fast, whilst letting everything else be on fire. If people knew they'd be judged for not having read that stuff, I feel like a lot of problems would just go away.

The Role of Deliberate Practice in the Acquisition of Expert Performance (PDF)

This link seems broken (though a google search finds many copies of the PDF).

To anyone landing on this page, the CFAR handbook is now available on LessWrong as a native sequence.

I'd prefer the S'wentworth Law of Measurement

2shminux2mo
Hah, I was just guessing at the real name.

It might be useful to add a quick summary of how arXiv works. I vaguely had the impression that anyone could upload PDFs to it, but some of the comments seem to pretty solidly disagree with that.

I would especially especially love it if it popped out a .tex file that I could edit, since I'm very likely to be using different language on LW than I would in a fancy academic paper.

4Davidmanheim3mo
Seconding the .tex export, since it's much more useful than just getting a pdf!

FYI the screenshots here say "Request feedback" but the actual button currently says "Get feedback". Might trip someone up if they're trying to search for the text.

2Ruby4mo
Thanks!

I feel generally agreeable towards this concept, and also towards the idea of being careful to use phrases as they are defined.

But I feel something else after starting to read the Arbital page. Since you quadruple insisted on it, I went ahead and actually opened the page and started reading it. And several things felt off in quick succession. I'm going to think out loud through those things here.

The first part is the concept of "guarded term". Here's part of the definition of that.

stretching it ... is an unusually strong discourtesy.

...You can't just say t...

2Raemon6mo
FWIW, I think this is an oversensitive frame-control reaction. Like, I agree there is (some) frame control* going on here, and there have been some other Eliezer-pieces that felt more-frame-control-y enough that I think it's reasonable to be watching out for. But it seems like you tapped out here at the slightest hint of it, and meanwhile... this term only exists at all because Eliezer thought it was an important concept to crystallize, and it's only in the public discourse right now because Eliezer started talking about it, and refusing to understand what he actually means when he says it just seems super weird to me. It was written on arbital which was always kinda in a weird beta state. Having read a fair amount of arbital posts, my sense is Eliezer was sort of privately writing the textbook/background reading that he thought was important for the AI Alignment community he wanted to build. Eliezer didn't crosspost it to LW as if it were written/ready for the LW audience, I did, so judging it on those terms feels weird. (* note: I think frame control is moderately common, isn't automatically bad, I think it might be a good rationalist-norm to acknowledge when you're doing it but that norm isn't at all established and definitely wasn't established in 2015 when this was first written.)

Okay, but how do we get technical terms with precise meanings that are analyzable using propositions that can be investigated and decided using logic and observation? If we're in a context where the meaning of words is automatically eroded by projection into low-dimensional, low-context concepts into whatever the surrounding political forces want, we're not going to get anywhere without being able to fix the meaning of words we need to have a non-obvious technically important use.

I have found throughout my life that there is virtually no correlation between what media other people like (friends, critics, etc) and what I like. Not even a negative correlation; just none. I have given up trying to understand this particular phenomenon.

I share some of your frustrations with what Yudkowsky says, but I really wish you wouldn't reinforce the implicit equating of [Yudkowsky's views] with [what LW as a whole believes]. There's tons of content on here arguing opposing views.

1mukashi6mo
I see, thank you for pointing that out. Do you agree at least that Yudkowsky's view is most visible view of the LW community? I mean, just count how many posts have been posted with that position and how many posts with the opposite.

I'm trying out independent AI alignment research.

2Chris_Leong6mo
Awesome!

Nice post! My main takeaway is "incentives are optimization pressures". I may have had that thought before but this tied it nicely in a bow.

Some editing suggestions/nitpicks;

The bullet point that starts with "As evidence for #3" ends with a hanging "How".

Quite recently, a lot of ideas have sort of snapped together into a coherent mindset.

I would put "for me" at the end of this. It does kind of read to me like you're about to describe for us how a scientific field has recently had a breakthrough.

I don't think I'm following what "Skin in the game" refers to....

1james.lucassen6mo
Thanks! Edits made accordingly. Two notes on the stuff you mentioned that isn't just my embarrassing lack of proofreading: * The definition of optimization used in Risks From Learned Optimization is actually quite different from the definition I'm using here. They say: "a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system." I personally don't really like this definition, since it leans quite hard on reifying certain kinds of algorithms - when is there "really" explicit search going on? Where is the search space? When does a configuration of atoms consitute an objective function? Using this definition strictly, humans aren't *really* optimizers, we don't have an explicit objective function written down anywhere. Balls rolling down hills aren't optimizers either. But by the definition of optimization I've been using here, I think pretty much all evolved organisms have to be at least weak optimizers, because survival is hard. You have manage constraints from food and water and temperature and predation etc... the window of action-sequences that lead to successful reproduction are really quite narrow compared to the whole space. Maintaining homeostasis requires ongoing optimization pressure. * Agree that not all optimization processes fundamentally have to be produced by other optimization processes, and that they can crop up anywhere you have the necessary negentropy resevoir. I think my claim is that optimization processes are by default rare (maybe this is exactly because they require negentropy?). But since optimizers beget other optimizers at a rate much higher than background, we should expect the majority of optimization to arise from other optimization. E

I'm a person who has lived in the Bay area almost the whole time CFAR has existed, and am also moderately (though not intensely) intertwined with that part of the rationalist social network. I was going to write up my own answer but I think you pretty much nailed it with your conclusion here, especially with the part about distinguishing individual people from the institution.

2Yitz6mo
If this is the case, it would be really nice to have confirmation from someone working there.