The image must be hosted!

This is no longer true, right?

(Also, I came here looking for a list of supported image types; I'm trying to insert an SVG, but it's just getting ignored.)

22d

I think most raster image format should work fine (I'm not surprised that SVGs
don't work, but, like, you can just take a screenshot of it and insert it or
something))

Gotcha, that makes sense! Agreed that an announcement tag is a good solution.

43d

I created this: https://www.lesswrong.com/tag/lw-team-announcements
[https://www.lesswrong.com/tag/lw-team-announcements]
I'm not 100% sure how well we'll stick to it but you can subscribe to it.

Meta-comment; It might be a good idea to create an official Lightcone-or-whatever LW account that you can publish these kinds of posts from. Then, someone could e.g. subscribe to that user, and get notified of all the official announcement-type posts, without having to subscribe to the personal account of Ruby-or-Ray-etc.

83d

Tagging feels more like the right abstraction here (you can subscribe to tags).
There's a Site Meta tag. We could make a LW Team Announcement tag or something.
We have a general policy against organizational-accounts in most cases (since
they make it less clear who you're talking to, and sort of shift culture into a
more bureaucratic direction). So, I'd avoid a solution where we ourselves did
that. :P

this post [link]

This link is missing!

23d

Fixed.

theoretical progress has been considerably faster than expected, while crossing the theory-practice gap has been mildly slower than expected. (Note that “theory progressing faster than expected, practice slower” is a potential red flag for theory coming decoupled from reality

I appreciate you flagging this. I read the former sentence and my immediate next thought was the heuristic in the parenthetical sentence.

My browser thinks this is an invalid link and won't let me open it.

Totally baseless conjecture that I have not thought about for very long; chaos is identical to Turing completeness. All dynamical systems that demonstrate chaotic behavior are Turing complete (or at least implement an undecidable procedure).

Has anyone heard of an established connection here?

88d

Might look at Wolfram's work. One of the major themes of his CA classification
project is that chaotic (in some sense, possibly not the rigorous ergodic
dynamics definition) rulesets are not Turing-complete; only CAs which are in an
intermediate region of complexity/simplicity have ever been shown to be TC.

FWIW I cannot find your podcast by searching in the app "Pocket Casts" (though I can on spotify).

19d

In general it seems that currently the podcast can only be found on Spotify.

If anyone's interested in doing an even less formal version of this, I think it would be really useful for me to have semi-regular chats with other people in the alignment space. This could be anything from "you mentor me for an hour a week at the Lightcone office" to "we chat for 15 minutes on zoom every few weeks". I feel reasonably connected to the community, but I think I would strongly benefit from more two-way real-time interaction.

(More info about me: I'm currently doing full-time independent alignment research, but just on my own, with no structure...

212d

If it's easier for you, we can already facilitate that through M&M. Like we
said, as long as both parties agree, you can do whatever makes sense for you :)
But the program might make finding other people easier.

Heh, well, see the aforementioned

it's almost what "doing math"

isfor me

It also feels like you're asking something like, "what's the most important problem you are trying to solve by having visual perception?" It's kind of just how I navigate the world at all (atoms or math).

But let me take your question at face value and try to answer it.

I think the main answer is something like "semantics". So much of my experiential knowledge is encoded in this physical, 3D physics manner, and when I can match up a symbolic expression with a physical scenario, I get a w...

212d

The first math problem I created was spawned from a visualization on a rare
occasion smoking marijuana. I started imagining cubes falling from the sky, and
just observing this process. I got curious about the chance of one cube landing
on top of another. This led to the problem:
Given a square boundary of size B, place squares of side length S one at a time
completely inside of the boundary at random locations. How many squares on
average will you place before a pair of squares overlaps?

Right, so my question to you is, how *do* you do math?? (This is probably silly question, but I'd love to hear your humor-me answer.)

411d

Last time I did math was when teaching game theory two days ago. I put a game on
the blackboard. I wrote down an inequality that determined when there would be a
certain equilibrium. Then I used the rules of algebra to simplify the
inequality. Then I discussed why the inequality ended up being that the discount
rate had to be greater than some number rather than less than some number.

It sure would be awesome if Lightcone Infrastructure spun up a Mastodon instance for the extended rationalist/EA/AI safety communities.

21mo

There is schelling.pt [https://schelling.pt/], which originated from the
tpot/tcot on twitter, which I've used for the last year and a half.

Hm, it seems pretty dependent on ontology to me – that's pretty much what the set of all states is, an ontology for how the world could be.

In case you missed it, LW 2.0 has feature support for creating sequences. If you hover over your username, the menu has a link to https://www.lesswrong.com/sequencesnew

Is this written against some hypothetical "static world" assumption

Basically exactly that, yeah. But that assumption exists both on a conscious level (in that many people don't consciously realize how much the universe has changed) *and* on a subconscious level, in that many ways the world currently is *feel* stable, even if you know they're not.

I'm psyched to have a podcast version! The narrator did a great job. I was wondering how they were going to handle several aspects of the post, and I liked how they did all of them.

Totally agree. Oliver & co. won tons of Bayes points off me.

Heh, I'm still skimming enough to catch this, but definitely not evaluating arguments.

I'm definitely still *open* to both changing my mind about the best use of terms and also updating the terminology in the sequence (although I suspect that will be quite a non-trivial amount of modified prose). And I think it's best if I don't actually think about it until after I publish another post.

I'd also be much more inclined to think harder about this discussion if there were more than two people involved.

My main goal here has always been "clearly explain the existin...

21mo

Cool cool. A summary of the claims that feel most important to me (for your
convenience, and b/c I'm having fun):
* K-complexity / "algorithmic entropy" is a bad concept that doesn't cleanly
relate to physics!entropy or info!entropy.
* In particular, the K-complexity of a state s is just the length of the
shortest code for s, and this is bad because when s has multiple codes it
should count as "simpler". (A state with five 3-bit codes is simpler than a
state with one 2-bit code.) (Which is why symmetry makes a concept simpler
despite not making its code shorter.)
* If we correct our notion of "complexity" to take multiple codes into account,
then we find that complexity of a state s (with respect to a coding scheme C)
is just the info!cross-entropy H(s,C). Yay!
Separately, some gripes:
* the algorithmic information theory concept is knuckleheaded, and only
approximates info!entropy if you squint really hard, and I'm annoyed about it
* I suspect that a bunch of the annoying theorems in algorithmic information
theory are annoying precisely because of all the squinting you have to do to
pretend that K-complexity was a good idea
And some pedagogical notes:
* I'm all for descriptive accounts of who uses "entropy" for what, but it's
kinda a subtle situation because: * info!entropy is a very general concept,
* physics!entropy is an interesting special
case of that concept (in the case where the state is a particular breed of
physical macrostate),
* algo!entropy is a derpy mistake that's
sniffing glue in the corner,
* algo!entropy is sitting right next to a
heretofore unnamed concept that is another interesting special case of
info!(cross-)entropy (in the case where the code is universal).
(oh and there's a bonus subtlety that if you port algo!entropy to a situation
where the coding schema has at most one code per state--which is emphatically
not the case in algorithmic informatio

quantum mechanics famously provides the measure on phase-space that classical statistical mechanics took as axiomatic

I'd be interested in a citation of what you're referring to here!

51mo

The state-space (for particles) in statmech is the space of possible positions
and momenta for all particles.
The measure that's used is uniform over each coordinate of position and
momentum, for each particle.
This is pretty obvious and natural, but not forced on us, and:
1. You get different, incorrect predictions about thermodynamics (!) if you use
a different measure.
2. The level of coarse graining is unknown, so every quantity of entropy has an
extra "+ log(# microstates per unit measure)" which is an unknown additive
constant. (I think this is separate from the relationship between bits and
J/K, which is a multiplicative constant for entropy -- k_B -- and doesn't
rely on QM afaik.)
On the other hand, Liouville's theorem gives some pretty strong justification
for using this measure, alleviating (1) somewhat:
https://en.wikipedia.org/wiki/Liouville%27s_theorem_(Hamiltonian)
[https://en.wikipedia.org/wiki/Liouville%27s_theorem_(Hamiltonian)]
In quantum mechanics, you have discrete energy eigenstates (...in a bound
system, there are technicalities here...) and you can define a microstate to be
an energy eigenstate, which lets you just count things and not worry about
measure. This solves both problems:
1. Counting microstates and taking the classical limit gives the "dx dp" (aka
"dq dp") measure, ruling out any other measure.
2. It tells you how big your microstates are in phase space (the answer is
related to Planck's constant, which you'll note has units of position *
momentum).
This section mostly talks about the question of coarse-graining, but you can see
that "dx dp" is sort of put in by hand in the classical version:
https://en.wikipedia.org/wiki/Entropy_(statistical_thermodynamics)#Counting_of_microstates
[https://en.wikipedia.org/wiki/Entropy_(statistical_thermodynamics)#Counting_of_microstates]
I wish I had a better citation but I'm not sure I do.
In general it seems like (2) is talked about more in the l

Did you want your "abstract entropy" to encompass both of these?

Indeed I definitely do.

I would add a big fat disclaimer

There are a bunch of places where I think I flagged relevant things, and I'm curious if these seem like enough to you;

- The whole post is called "abstract entropy", which should tell you that it's at least a little different from any "standard" form of entropy
- The third example, "It helps us understand strategies for (and limits on) file compression", is implicitly about K-complexity
- This whole paragraph: "Many people reading this will have so

11mo

I initially interpreted "abstract entropy" as meaning statistical entropy as
opposed to thermodynamic or stat-mech or information-theoretic entropy. I think
very few people encounter the phrase "algorithmic entropy" enough for it to be
salient to them, so most confusion about entropy in different domains is about
statistical entropy in physics and info theory. (Maybe this is different for LW
readers!)
This was reinforced by the introduction because I took the mentions of file
compression and assigning binary strings to states to be about (Shannon-style)
coding theory, which uses statistical entropy heavily to talk about these same
things and is a much bigger part of most CS textbooks/courses. (It uses phrases
like "length of a codeword", "expected length of a code [under some
distribution]", etc. and then has lots of theorems about statistical entropy
being related to expected length of an optimal code.)
After getting that pattern going, I had enough momentum to see "Solomonoff",
think "sure, it's a probability distribution, presumably he's going to do
something statistical-entropy-like with it", and completely missed the
statements that you were going to be interpreting K complexity itself as a kind
of entropy. I also missed the statement about random variables not being
necessary.
I suspect this would also happen to many other people who have encountered stat
mech and/or information theory, and maybe even K complexity but not the phrase
"algorithmic entropy", but I could be wrong.
A disclaimer is probably not actually necessary, though, on reflection; I care a
lot more about the "minimum average" qualifiers both being included in
statistical-entropy contexts. I don't know exactly how to unify this with
"algorithmic entropy" but I'll wait and see what you do :)

Just mulling over other names, I think "description length" is the one I like best so far. Then "entropy" would be defined as minimum average description length.

31mo

I like "description length".
One wrinkle is that entropy isn't quite minimum average description length -- in
general it's a lower bound on average description length.
If you have a probability distribution that's (2/3, 1/3) over two things, but
you assign fixed binary strings to each of the two, then you can't do better
than 1 bit of average description length, but the entropy of the distribution is
0.92 bits.
Or if your distribution is roughly (.1135, .1135, .7729) over three things, then
you can't do better than 1.23 bits, but the entropy is 1 bit.
You can only hit the entropy exactly when the probabilities are all powers of 2.
(You can fix this a bit in the channel-coding context, where you're encoding
sequences of things and don't have to assign fixed descriptions to individual
things. In particular, you can assign descriptions to blocks of N things, which
lets you get arbitrarily close as N -> infinity.)

That makes sense. In my post I'm saying that entropy is whatever binary string assignment you want, which does *not* depend on the probability distribution you're using to weight things. And then if you want the *minimum average* string length, it becomes in terms of the probability distribution.

31mo

Ah, I missed this on a first skim and only got it recently, so some of my
comments are probably missing this context in important ways. Sorry, that's on
me.

one of my personal spicy takes...

Omfg, I love hearing your spicy takes. (I think I remember you advocating hard tabs, and trinary logic.)

ə, pronounced "schwa", for 1/e

`lug`

, pronounced /ləg/, for log base ə

`nl`

for "negative logarithm"

XD XD guys I literally can't

Extremely pleased with this reception! I indeed feel pretty seen by it.

I think he suggested that this naming fits with something he wants to do with K complexity

I didn't mean something I'm doing, I meant that the field of K-complexity just straight-forwardly uses the word "entropy" to refer to it. Let me see if I can dig up some references.

31mo

K-complexity is apparently sometimes called "algorithmic entropy" (but not just
"entropy", I don't think?)
Wiktionary quotes Niels Henrik Gregersen:
I think this might be the crux!
Note the weird type mismatch: "the statistical entropy of an ensemble [...] the
ensemble average of the algorithmic entropy of its members".
So my story would be something like the following:
1. Many fields (thermodynamics, statistical mechanics, information theory,
probability) use "entropy" to mean something equivalent to "the expectation
of -log(p) for a distribution p". Let's call this "statistical entropy", but
in practice people call it "entropy".
2. Algorithmic information theorists have an interestingly related but distinct
concept, which they sometimes call "algorithmic entropy".
Whoops, hang on a sec. Did you want your "abstract entropy" to encompass both of
these?
If so, I didn't realize that until now! That changes a lot, and I apologize
sincerely if waiting for the K-complexity stuff would've dissipated a lot of the
confusion.
Things I think contributed to my confusion:
(1) Your introduction only directly mentions / links to domain-specific types of
entropy that are firmly under (type 1) "statistical entropy"
(2) This intro post doesn't yet touch on (type 2) algorithmic entropy, and is
instead a mix of type-1 and your abstract thing where description length and
probability distribution are decoupled.
(3) I suspect you were misled by the unpedagogical phrase "entropy of a
macrostate" from statmech, and didn't realize that (as used in that field) the
distribution involved is determined by the macrostate in a prescribed way (or is
the macrostate).
I would add a big fat disclaimer that this series is NOT just limited to type-1
entropy, and (unless you disagree with my taxonomy here) emphasize heavily that
you're including type-2 entropy.

Part of what confuses me about your objection is that it seems like averages of things can *usually* be treated the same as the individual things. E.g. an average number of apples is a number of apples, and average height is a height ("Bob is taller than Alice" is treated the same as "men are taller than women"). The sky is blue, by which we mean that the average photon frequency is in the range defined as blue; we also just say "a blue photon".

A possible counter-example I can think of is temperature. Temperature is the average [something like] kinetic energ...

31mo

I think it's different because entropy is an expectation of a thing which
depends on the probability distribution that you're using to weight things.
Like, other things are maybe... A is the number of apples, sum of p×A is the
expected number of apples under distribution p, sum of q×A is the expected
number of apples under distribution q.
But entropy is... -log(p) is a thing, and sum of p × -log(p) is the entropy.
And the sum of q × -log(p) is... not entropy! (It's "cross-entropy")

(Let's not call it "probability" because that has too much baggage.)

This aside raises concerns for me, like it makes me worry that maybe we're more deeply not on the same page. It seems to me like the weighing is just straight-forward probability, and that it's important to call it that.

11mo

I think I was overzealous with this aside and regret it.
I worry that the word "probability" has connotations that are too strong or are
misleading for some use cases of abstract entropy.
But this is definitely probability in the mathematical sense, yes.
Maybe I wish mathematical "probability" had a name with weaker connotations.

One thing I'm not very confident about is how working scientists use the concept of "macrostate". If I had good resources for that I might change some of how the sequence is written, because I don't want to create any confusion for people who use this sequence to learn and then go on to work in a related field. (...That said, it's not like people aren't already confused. I kind of expect most working scientists to be confused about entropy outside their exact domain's use.)

11mo

I think it might be a bit of a mess, tbh.
In probability theory, you have outcomes (individual possibilities), events
(sets of possibilities), and distributions (assignments of probabilities to all
possible outcomes).
"microstate": outcome.
"macrostate": sorta ambiguous between event and distribution.
"entropy of an outcome": not a thing working scientists or mathematicians say,
ever, as far as I know.
"entropy of an event": not a thing either.
"entropy of a distribution": that's a thing!
"entropy of a macrostate": people say this, so they must mean a distribution
when they are saying this phrase.
I think you're within your rights to use "macrostate" in any reasonable way that
you like. My beef is entirely about the type signature of "entropy" with regard
to distributions and events/outcomes.

Here's another thing that might be adding to our confusion. It just so happens that in the particular system that is *this* universe, all states with the same total energy are equally likely. That's not true for most systems (which don't even have a concept of energy), and so it doesn't seem like a part of abstract entropy to me. So e.g. macrostates don't necessarily contain microstates of equal probability (which I think you've implied a couple times).

11mo

Honestly, I'm confused about this now.
I thought I recalled that "macrostate" was only used for the "microcanonical
ensemble" (fancy phrase for a uniform-over-all-microstates-with-same-(E,N,V)
probability distribution), but in fact it's a little ambiguous.
Wikipedia says
which implies microcanonical ensemble (the other are parametrized by things
other than (E, N, V) triples), but then later it talks about both the canonical
and microcanonical ensemble.
I think a lot of our confusion comes from way physicists equivocate between
macrostates as a set of microstates (with the probability distribution)
unspecified) and as a probability distribution. Wiki's "definition" is
ambiguous: a particular (E, N, V) triple specifies both a set of microstates
(with those values) and a distribution (uniform over that set).
In contrast, the canonical ensemble is a probability distribution defined by a
triple (T,N,V), with each microstate having probability proportional to exp(- E
/ kT) if it has particle number N and volume V, otherwise probability zero. I'm
not sure what "a macrostate specified by (T,N,V)" should mean here: either the
set of microstates with (N, V) (and any E), or the non-uniform distribution I
just described.
(By the way: note that when T is being used here, it doesn't mean the average
energy, kinetic or otherwise. kT isn't the actual energy of anything, it's just
the slope of the exponential decay of probability with respect to energy. A
consequence of this definition is that the expected kinetic energy in some
contexts is proportional to temperature, but this expectation is for a
probability distribution over many microstates that may have more or less
kinetic energy than that. Another consequence is that for large systems, the
average kinetic energy of particles in the actual true microstate is very likely
to be very close to (some multiple of) kT, but this is because of the law of
large numbers and is not true for small systems. Note that there's two dif

I'm not quite sure what the cruxes of our disagreement are yet. So I'm going to write up some more of how I'm thinking about things, which I think might be relevant.

When we decide to model a system and assign its states entropy, there's a question of what set of states we're including. Often, we're modelling part of the real universe. The real universe is in only one state at any given time. But we're ignorant of a bunch of parts of it (and we're also ignorant about exactly what states it will evolve into over time). So to do some analysis, we decide on so...

11mo

I think the crux of our disagreement [edit: one of our disagreements] is whether
the macrostate we're discussing can be chosen independently of the "uncertainty
model" at all.
When physicists talk about "the entropy of a macrostate", they always mean
something of the form:
* There are a bunch of p's that add up to 1. We want the sum of p × (-log p)
over all p's. [EXPECTATION of -log p aka ENTROPY of the distribution]
They never mean something of the form:
* There are a bunch of p's that add up to 1. We want the sum of p × (-log p)
over just some of the p's. [???]
Or:
* There are a bunch of p's that add up to 1. We want the sum of p × (-log p)
over just some of the p's, divided by the sum of p over the same p's.
[CONDITIONAL EXPECTATION of -log p given some event]
Or:
* There are a bunch of p's that add up to 1. We want the sum of (-log p) over
just some of the p's, divided by the number of p's we included. [ARITHMETIC
MEAN of -log p over some event]
This also applies to information theorists talking about Shannon entropy.
I think that's the basic crux here.
This is perhaps confusing because "macrostate" is often claimed to have
something to do with a subset of the microstates. So you might be forgiven for
thinking "entropy of a macrostate" in statmech means:
* For some arbitrary distribution p, consider a separately-chosen "macrostate"
A (a set of outcomes). Compute the sum of p × (-log p) over every p whose
corresponding outcome is in A, maybe divided by the total probability of A or
something.
But in fact this is not what is meant!
Instead, "entropy of a macrostate" means the following:
* For some "macrostate", whatever the hell that means, we construct a
probability distribution p. Maybe that's the macrostate itself, maybe it's a
distribution corresponding to the macrostate, usage varies. But the
macrostate determines the distribution, either way. Compute the sum of p ×
(-log p) over every p.
EDIT

The historical baggage is something that tripped me up, too. In an upcoming post I have a section about classical thermodynamic entropy, including an explanation of the weird units!

11mo

That's great, I subscribed and looking forward to it!

Nice catches. I love that somebody double-checked all the binary strings. :)

I think it's also important for my definition of optimization (coming later), because individual microstates do deserve to be assigned a specific level of optimization.

That's a reasonable stance, but one of the main messages of the sequence is that we can *start* with the concept of individual states having entropy assigned to them, and derive everything else from there! This is especially relevant to the idea of using Kolmogorov complexity as entropy. Calling it "surprisal" or "information" has an information-theoretic connotation to it that I think doesn't apply in all contexts.

41mo

I'm fine with choosing some other name, but I think all of the different
"entropies" (in stat mech, information theory, etc) refer to weighted averages
over a set of states, whose probability-or-whatever adds up to 1. To me that
suggests that this should also be true of the abstract version.
So I stand by the claim that the negative logarithm of probability-or-whatever
should have some different name, so that people don't get confused by the
([other thing], entropy) → (entropy, average entropy) terminology switch.
I think "average entropy" is also (slightly) misleading because it suggests that
the -log(p)'s of individual states are independent of the choice of which
microstates are in your macrostate, which I think is maybe the root problem I
have with footnote 17. (See new comment in that subthread)

21mo

I think it's also important for my definition of optimization (coming later),
because individual microstates do deserve to be assigned a specific level of
optimization.

Maybe a slightly better title to the post would be "Plans are prediction, not optimization targets"? I found the "plans are predictions" part of the post to be the most insightful, and the rewording also removes a "should".

21mo

Good suggestion! Change made. Thankyou!

Loved this post. Both because I think this is a valuable set of reasoning heuristics, and because I read it in your voice, which made it feel something like a rationalist standup routine.

Should there be an "advice for new orgs" tag?

52mo

For the record, my sense of the biggest single problem with new orgs is that
they don't know that they should read Paul Graham's essays, and more importantly
they don't know that they should watch the YC video lectures
[https://www.youtube.com/watch?v=CBYhVcO4WgI]. I feel in my conversations like I
just don't have a shared referent for what a 'functional organization' looks
like, these people keep talking about hiring as though it's a good thing, about
looking professional, and so on. No, top priority is small number of people, and
getting the key thing done impressively fast, whilst letting everything else be
on fire.
If people knew they'd be judged for not having read that stuff, I feel like a
lot of problems would just go away.

The Role of Deliberate Practice in the Acquisition of Expert Performance(PDF)

This link seems broken (though a google search finds many copies of the PDF).

To anyone landing on this page, the CFAR handbook is now available on LessWrong as a native sequence.

I'd prefer the S'wentworth Law of Measurement

22mo

Hah, I was just guessing at the real name.

It might be useful to add a quick summary of how arXiv works. I vaguely had the impression that anyone could upload PDFs to it, but some of the comments seem to pretty solidly disagree with that.

I would especially especially love it if it popped out a .tex file that I could edit, since I'm very likely to be using different language on LW than I would in a fancy academic paper.

43mo

Seconding the .tex export, since it's much more useful than just getting a pdf!

FYI the screenshots here say "Request feedback" but the actual button currently says "Get feedback". Might trip someone up if they're trying to search for the text.

24mo

Thanks!

I feel generally agreeable towards this concept, and also towards the idea of being careful to use phrases as they are defined.

But I feel something else after starting to read the Arbital page. Since you *quadruple* insisted on it, I went ahead and actually opened the page and started reading it. And several things felt off in quick succession. I'm going to think out loud through those things here.

The first part is the concept of "guarded term". Here's part of the definition of that.

stretching it ... is an unusually strong discourtesy.

...You can't just say t...

26mo

FWIW, I think this is an oversensitive frame-control reaction. Like, I agree
there is (some) frame control* going on here, and there have been some other
Eliezer-pieces that felt more-frame-control-y enough that I think it's
reasonable to be watching out for.
But it seems like you tapped out here at the slightest hint of it, and
meanwhile... this term only exists at all because Eliezer thought it was an
important concept to crystallize, and it's only in the public discourse right
now because Eliezer started talking about it, and refusing to understand what he
actually means when he says it just seems super weird to me.
It was written on arbital which was always kinda in a weird beta state. Having
read a fair amount of arbital posts, my sense is Eliezer was sort of privately
writing the textbook/background reading that he thought was important for the AI
Alignment community he wanted to build. Eliezer didn't crosspost it to LW as if
it were written/ready for the LW audience, I did, so judging it on those terms
feels weird.
(* note: I think frame control is moderately common, isn't automatically bad, I
think it might be a good rationalist-norm to acknowledge when you're doing it
but that norm isn't at all established and definitely wasn't established in 2015
when this was first written.)

Okay, but how do we get technical terms with precise meanings that are analyzable using propositions that can be investigated and decided using logic and observation? If we're in a context where the meaning of words is automatically eroded by projection into low-dimensional, low-context concepts into whatever the surrounding political forces want, we're not going to get anywhere without being able to fix the meaning of words we need to have a non-obvious technically important use.

I have found throughout my life that there is virtually no correlation between what media other people like (friends, critics, etc) and what I like. Not even a negative correlation; just none. I have given up trying to understand this particular phenomenon.

I share some of your frustrations with what Yudkowsky says, but I really wish you wouldn't reinforce the implicit equating of [Yudkowsky's views] with [what LW as a whole believes]. There's *tons* of content on here arguing opposing views.

16mo

I see, thank you for pointing that out. Do you agree at least that Yudkowsky's
view is most visible view of the LW community? I mean, just count how many posts
have been posted with that position and how many posts with the opposite.

I'm trying out independent AI alignment research.

26mo

Awesome!

Nice post! My main takeaway is "incentives are optimization pressures". I may have had that thought before but this tied it nicely in a bow.

Some editing suggestions/nitpicks;

The bullet point that starts with "As evidence for #3" ends with a hanging "How".

Quite recently,alotofideashave sort of snapped together into a coherent mindset.

I would put "for me" at the end of this. It does kind of read to me like you're about to describe for us how a scientific field has recently had a breakthrough.

I don't think I'm following what "Skin in the game" refers to....

16mo

Thanks! Edits made accordingly. Two notes on the stuff you mentioned that isn't
just my embarrassing lack of proofreading:
* The definition of optimization used in Risks From Learned Optimization is
actually quite different from the definition I'm using here. They say:
"a system is an optimizer if it is internally searching through a search
space (consisting of possible outputs, policies, plans, strategies, or
similar) looking for those elements that score high according to some
objective function that is explicitly represented within the system."
I personally don't really like this definition, since it leans quite hard on
reifying certain kinds of algorithms - when is there "really" explicit search
going on? Where is the search space? When does a configuration of atoms
consitute an objective function? Using this definition strictly, humans
aren't *really* optimizers, we don't have an explicit objective function
written down anywhere. Balls rolling down hills aren't optimizers either.
But by the definition of optimization I've been using here, I think pretty
much all evolved organisms have to be at least weak optimizers, because
survival is hard. You have manage constraints from food and water and
temperature and predation etc... the window of action-sequences that lead to
successful reproduction are really quite narrow compared to the whole space.
Maintaining homeostasis requires ongoing optimization pressure.
* Agree that not all optimization processes fundamentally have to be produced
by other optimization processes, and that they can crop up anywhere you have
the necessary negentropy resevoir. I think my claim is that optimization
processes are by default rare (maybe this is exactly because they require
negentropy?). But since optimizers beget other optimizers at a rate much
higher than background, we should expect the majority of optimization to
arise from other optimization. E

I'm a person who has lived in the Bay area almost the whole time CFAR has existed, and am also moderately (though not intensely) intertwined with that part of the rationalist social network. I was going to write up my own answer but I think you pretty much nailed it with your conclusion here, especially with the part about distinguishing individual people from the institution.

26mo

If this is the case, it would be really nice to have confirmation from someone
working there.

Is there a bug around resizing images? Previously I've found that my image size choice is ignored unless the image has a caption. But for gifs, it seems to ignore it even if there is a caption, instead rendering the image at the full width of the article.