1 min read24th May 2022123 comments
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
This is a special post for quick takes by leogao. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

New to LessWrong?

123 comments, sorted by Click to highlight new comments since: Today at 11:55 AM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

i've noticed a life hyperparameter that affects learning quite substantially. i'd summarize it as "willingness to gloss over things that you're confused about when learning something". as an example, suppose you're modifying some code and it seems to work but also you see a warning from an unrelated part of the code that you didn't expect. you could either try to understand exactly why it happened, or just sort of ignore it.

reasons to set it low:

  • each time your world model is confused, that's an opportunity to get a little bit of signal to improve your world model. if you ignore these signals you increase the length of your feedback loop, and make it take longer to recover from incorrect models of the world.
  • in some domains, it's very common for unexpected results to actually be a hint at a much bigger problem. for example, many bugs in ML experiments cause results that are only slightly weird, but if you tug on the thread of understanding why your results are slightly weird, this can cause lots of your experiments to unravel. and doing so earlier rather than later can save a huge amount of time
  • understanding things at least one level of abstraction down often lets you do things more
... (read more)
2Gunnar_Zarncke2mo
This seems to be related to Goldfish Reading. Or maybe complementary. In Goldfish Reading one reads the same text multiple times, not trying to understand it all at once or remember everything, i.e., intentionally ignoring confusion. But in a structured form to avoid overload. 
4leogao2mo
Yeah, this seems like a good idea for reading - lets you get best of both worlds. Though it works for reading mostly because it doesn't take that much longer to do so. This doesn't translate as directly to e.g what to do when debugging code or running experiments.
1Johannes C. Mayer2mo
I think it's very important to keep track of what you don't know. It can be useful to not try to get the best model when that's not the bottleneck. But I think it's always useful to explicitly store the knowledge of what models are developed to what extent.
1Johannes C. Mayer2mo
The algorithm that I have been using, where what to understand to what extend is not a hyperparameter, is to just solve the actual problems I want to solve, and then always slightly overdo the learning, i.e. I would always learn a bit more than necessary to solve whatever subproblem I am solving right now. E.g. I am just trying to make a simple server, and then I learn about the protocol stack. This has the advantage that I am always highly motivated to learn something, as the path to the problem on the graph of justifications is always pretty short. It also ensures that all the things that I learn are not completely unrelated to the problem I am solving. I am pretty sure if you had perfect control over your motivation this is not the best algorithm, but given that you don't, this is the best algorithm I have found so far.

any time someone creates a lot of value without capturing it, a bunch of other people will end up capturing the value instead. this could be end consumers, but it could also be various middlemen. it happens not infrequently that someone decides not to capture the value they produce in the hopes that the end consumers get the benefit, but in fact the middlemen capture the value instead

5Matt Goldenberg20d
can you give examples?
6leogao20d
an example: open source software produces lots of value. this value is partly captured by consumers who get better software for free, and partly by businesses that make more money than they would otherwise. the most clear cut case is that some businesses exist purely by wrapping other people's open source software, doing advertising and selling it for a handsome profit; this makes the analysis simpler, though to be clear the vast majority of cases are not this egregious. in this situation, the middleman company is in fact creating value (if a software is created in a forest with no one around to use it, does it create any value?) by using advertising to cause people to get value from software. in markets where there are consumers clueless enough to not know about the software otherwise (e.g legacy companies), this probably does actually create a lot of counterfactual value. however, most people would agree that the middleman getting 90% of the created value doesn't satisfy our intuitive notion of fairness. (open source developers are more often trying to have the end consumers benefit from better software, not for random middlemen to get rich off their efforts) and if advertising is commoditized, then this problem stops existing (you can't extract that much value as an advertising middleman if there is an efficient market with 10 other competing middlemen), and so most of the value does actually accrue to the end user.
2ryan_greenblatt20d
Often tickets will be sold at prices considerably lower than the equilibrium price and thus ticket scalpers will buy the tickets and then resell for a high price. That said, I don't think this typically occurs because the company/group originally selling the tickets wanted consumers to benefit, it seems more likely that this is due to PR reasons (it looks bad to sell really expensive tickets). This is actually a case where it seems likely that the situation would be better for consumers if the original seller captured the value. (Because buying tickets from random scalpers is annoying.)
2Viliam17d
I wonder how much of this is the PR reasons, and how much something else... for example, the scalpers cooperating (and sharing a part of their profits) with the companies that sell tickets. To put it simply, if I sell a ticket for $200, I need to pay a tax for the $200. But if I sell the same ticket for $100 and the scalper re-sells it for $200, then I only need to pay the tax for $100, which might be quite convenient if the scalper... also happens to be me? (More precisely, some of the $100 tickets are sold to genuine 3rd party scalpers, but most of them I sell to myself... but according to my tax reports, all of them were sold to the 3rd party.)
2the gears to ascension20d
ticket scalping is bad and we should find some sort of fully distributed market mechanism that makes scalping approach impossible without requiring the ticket seller to capture the value. it ought to be possible to gift value to end customers rather than requiring the richest to be the ones who get the benefit, how can that be achieved?
2ryan_greenblatt19d
The simple mechanism is: * Charge market prices (auction or just figure out the equilibrium price normally) * Redistribute the income uniformly to some group. Aka UBI. Of course, you could make the UBI be to (e.g.) Taylor Swift fans in particular, but this is hardly a principled approach to redistribution. Separately, musicians (and other performers) might want to subsidize tickets for extremely hard core fans because these fans add value to the event (by being enthusiastic). For this, the main difficulty is that it's hard to cheaply determine if someone is a hard core fan. (In principle, being prepared to buy tickets before they run out could be an OK proxy for this, but it fails in practice, at least for buying tickets online.) More discussion is in this old planet money episode.
2Dagon20d
It's worth examining whether "capturing value" and "providing value" are speaking of the same thing.  In many cases, the middlemen will claim that they're actually providing the majority of the value, in making the underlying thing useful or available.  They may or may not be right. For most goods, it's not clear how much of the consumer use value comes from the idea, the implementation of the idea, or from the execution of the delivery and packaging.  Leaving aside government-enforced exclusivity, there are usually reasons for someone to pay for the convenience, packaging, and bundling of such goods. I worked (long ago) in physical goods distribution for toys and novelties.  I was absolutely and undeniably working for a middleman - we bought truckloads of stuff from factories, repackaged it for retail, and sold it at a significant markup to retail stores, who marked it up again and sold it to consumers.  Our margins were good, but all trades were voluntary and I don't agree with a framing that we were "capturing" existing value rather than creating value in connecting supply with demand.
1StartAtTheEnd20d
All value is finite, and every time value is used, it decreases. The middlemen are merely causing the thing to die faster. For instance, if you discover a nice beach which hasn't been ruined with plastic and glass bottle yet, and make it into a popular area, you won't get to spend many happy summers at that place. If you find oil and sell it, are you creating value, or are you destroying value? I think both perspectives are valid. But since the openness of information in the modern world makes it so that everything which can be exploited will be exploited, and until the point that exploitation is no longer possible (as with the ruined beach), I strongly dislike unsustainable exploitation and personally tend toward the "destroying value" view. And if you want something to worry about, let it be premature exploitation. X 'creates' value and chooses not to exploit it prematurely, but then Y will come along and take it, so X is forced to capitalize on it early. Now you have a moloch problem on your hands.
2leogao20d
of course, this is more a question about equilibria than literal transactions. suppose you capture most of the value and then pay it back out to users as a dividend: the users now have more money with which they could pay a middleman, and a middleman that could have extracted some amount of value originally can still extract that amount of value in this new situation. we can model this as a game of ultimatum between the original value creator and the middlemen. if the participation of the OVC and middleman are both necessary, the OVC can bargain for half the value in an iterated game / as FDT agents. however, we usually think of the key differentiating factor between the OVC and middlemen as the middlemen being more replaceable, so the OVC should be able to bargain for a lot more. (see also: commoditizing your complement) so to ensure that the end users get most of the value, you need to either ensure that all middleman roles are commoditized, or precommit to only provide value in situations where the end user can actually capture most of the value
2Dagon20d
The equilibrium comprises literal transactions, right?  You should be able to find MANY representative specific examples to analyze, which would help determine whether your model of value is useful in these cases.   My suspicion is that you're trying to model "value" as something that's intrinsic, not something which a relation between individuals, which means you are failing to see that the packaged/paid/delivered good is actually distinct and non-fungible with the raw/free/open good, for the customers who choose that route. Note that in the case of open-source software, it's NOT a game of ultimatum, because both channels exist simultaneously and neither has the option to deny the other.  A given consumer paying for one does not prevent some other customer (or even the same customer in parallel) using the direct free version.  
2leogao19d
I make no claim to fungibility or lack of value created by middlemen.

saying "sorry, just to make sure I understand what you're saying, do you mean [...]" more often has been very valuable

2Viliam2mo
yeah, turns off the combat mode
2leogao2mo
more importantly, both i and the other person get more out of the conversation. almost always, there are subtle misunderstandings and the rest of the conversation would otherwise involve a lot of talking past each other. you can only really make progress when you're actually engaging with the other person's true beliefs, rather than a misunderstanding of their beliefs.

hypothesis: intellectual progress mostly happens when bubbles of non tribalism can exist. this is hard to safeguard because tribalism is a powerful strategy, and therefore insulating these bubbles is hard. perhaps it is possible for there to exist a monopoly on tribalism to make non tribal intellectual progress happen, in the same way a monopoly on violence makes it possible to make economically valuable trade without fear of violence

6Daniel Kokotajlo5mo
Continuing the analogy: You'd want there to be a Tribe, or perhaps two or more Tribes, that aggressively detect and smack down any tribalism that isn't their own. It needs to be the case that e.g. when some academic field starts splintering into groups that stereotype and despise each other, or when people involved in the decision whether to X stop changing their minds frequently and start forming relatively static 'camps,' the main Tribe(s) notice this and squash it somehow.  And/or maybe arrange things so it never happens in the first place. I wonder if this sorta happens sometimes when there is an Official Religion?
4leogao5mo
another way to lean really hard into the analogy: you could have a Tribe which has a constitution/laws that dictate what kinds of argument are ok and which aren't, has a legislative branch that constantly thinks about what kinds of arguments are non truthseeking and should be prohibited, a judicial branch that adjudicates whether particular arguments were truthseeking by the law, and has the monopoly on tribalism in that it is the only entity that can legitimately silence people's arguments or (akin to exile) demand that someone be ostracized. there would also be foreign relations/military (defending the continued existence of the Tribe against all the other tribes out there, many of which will attempt to destroy the Tribe via very nontruthseeking means)
5leogao5mo
unfortunately this is pretty hard to implement. free speech/democracy is a very strong baseline but still insufficient. the key property we want is a system where true things systematically win over false things (even when the false things appeal to people's biases), and it is sufficiently reliable at doing so and therefore intellectually legitimate that participants are willing to accept the outcome of the process even when it disagrees with what they started with. perhaps there is some kind of debate protocol that would make this feasible?
2Viliam5mo
Prediction markets? Generally, track people's previous success rates about measurable things.
4leogao5mo
prediction markets have two major issues for this use case. one is that prediction markets can only tell you whether people have been calibrated in the past, which is useful signal and filters out pundits but isn't very highly reliable for out of distribution questions (for example, ai x-risk). the other is that they don't really help much with the case where all the necessary information is already available but it is unclear what conclusion to draw from the evidence (and where having the right deliberative process to make sure the truth comes out at the end is the cat-belling problem). prediction markets can only "pull information from the future" so to speak.
2Viliam5mo
BTW, I like the "monopoly on violence" analogy. We can extend it to include verbal violence -- you can have an environment where it is okay to yell at people for being idiots, or you can have an environment where it is okay to yell at people for being politically incorrect. Both will shape the intellectual development in certain directions. Conflicts arise is when you don't have a monopoly, so sometimes people get yelled at for being idiots, other times for being politically incorrect, and then you have endless "wars" about whether we should or shouldn't study a politically sensitive topic X with an open mind, both sides complaining about lack of progress (from their perspective). The more mutually contradictory constraints you have, the more people will choose the strategy "let's not do anything unusual", because it is too likely to screw up according to some of the metrics and get yelled at.

random fun experiment: accuracy of GPT-4 on "Q: What is 1 + 1 + 1 + 1 + ...?\nA:"

4leogao1y
blue: highest logprob numerical token orange: y = x
4aphyer1y
...I am suddenly really curious what the accuracy of humans on that is.
7Richard_Kennaway1y
'Can you do Addition?' the White Queen asked. 'What's one and one and one and one and one and one and one and one and one and one?' 'I don't know,' said Alice. 'I lost count.'
1niknoble1y
This is a cool idea. I wonder how it's able to do 100, 150, and 200 so well. I also wonder what are the exact locations of the other spikes?
1niknoble1y
Oh, I see your other graph now. So it just always guesses 100 for everything in the vicinity of 100.

it's often stated that believing that you'll succeed actually causes you to be more likely to succeed. there are immediately obvious explanations for this - survivorship bias. obviously most people who win the lottery will have believed that buying lottery tickets is a good idea, but that doesn't mean we should take that advice. so we should consider the plausible mechanisms of action.

first, it is very common for people with latent ability to underestimate their latent ability. in situations where the cost of failure is low, it seems net positive to at least take seriously the hypothesis that you can do more than you think you can. (also keeping in mind that we often overestimate the cost of failure). there are also deleterious mental health effects to believing in a high probability of failure, and then bad mental health does actually cause failure - it's really hard to give something your all if you don't really believe in it.

belief in success also plays an important role in signalling. if you're trying to make some joint venture happen, you need to make people believe that the joint venture will actually succeed (opportunity costs exist). when assessing the likelihood of success... (read more)

Is it a very universal experience to find it easier to write up your views if it's in response to someone else's writeup? Seems like the kind of thing that could explain a lot about how research tends to happen if it were a pretty universal experience.

6ryan_greenblatt3mo
I think so/I have this. (I would emoji react for a less heavy response, but doesn't work on older short forms) The corollary is that it's really annoying to respond to widely held views or frames which aren't clearly written up anywhere. Particularly if these views are very inprecise and confused.
6leogao3mo
new galaxy brain hypothesis of how research advances: progress happens when people feel unhappy about a bad but popular paper and want to prove it wrong (or when they feel like they can do even better than someone else) this explains: * why it's often necessary to have bad incremental papers that don't introduce any generalizable techniques (nobody will care about the followup until it's refuting the bad paper) * why so much of academia exists to argue that other academics are wrong and bad * why academics sometimes act like things don't exist unless there's a paper about them, even though the thing is really obvious
2MikkW3mo
This subjectively seems to me to be the case.
1Pat Myron3mo
https://xkcd.com/386/

Since there are basically no alignment plans/directions that I think are very likely to succeed, and adding "of course, this will most likely not solve alignment and then we all die, but it's still worth trying" to every sentence is low information and also actively bad for motivation, I've basically recalibrated my enthusiasm to be centered around "does this at least try to solve a substantial part of the real problem as I see it". For me at least this is the most productive mindset for me to be in, but I'm slightly worried people might confuse this for me having a low P(doom), or being very confident in specific alignment directions, or so on, hence this post that I can point people to.

I think this may also be a useful emotional state for other people with similar P(doom) and who feel very demotivated by that, which impacts their productivity.

philosophy: while the claims "good things are good" and "bad things are bad" at first appear to be compatible with each other, actually we can construct a weird hypothetical involving exact clones that demonstrates that they are fundamentally inconsistent with each other

law: could there be ambiguity in "don't do things that are bad as determined by a reasonable person, unless the thing is actually good?" well, unfortunately, there is no way to know until it actually happens

6Dagon2d
I think I need to hear more context (and likely more words in the sentences) to understand what inconsistency you're talking about.  "good things are good" COULD be just a tautology, with the assumption that "good things" are relative to a given agent, and "good" is furtherance of the agent's preferences.  Or it could be a hidden (and false) claim of universality  "good things" are anything that a lot of people support, and "are good" means truly pareto-preferred with no harm to anyone.   Your explanation "by a reasonable person" is pretty limiting, there being no persons who are reasonable on all topics.  Likewise "actually good" - I think there's no way to know even after it happens.

a common discussion pattern: person 1 claims X solves/is an angle of attack on problem P. person 2 is skeptical. there is also some subproblem Q (90% of the time not mentioned explicitly). person 1 is defending a claim like "X solves P conditional on Q already being solved (but Q is easy)", whereas person 2 thinks person 1 is defending "X solves P via solving Q", and person 2 also believes something like "subproblem Q is hard". the problem with this discussion pattern is it can lead to some very frustrating miscommunication:

  • if the discussion recurses into whether Q is hard, person 1 can get frustrated because it feels like a diversion from the part they actually care about/have tried to find a solution for, which is how to find a solution to P given a solution to Q (again, usually Q is some implicit assumption that you might not even notice you have). it can feel like person 2 is nitpicking or coming up with fully general counterarguments for why X can never be solved.
  • person 2 can get frustrated because it feels like the original proposed solution doesn't engage with the hard subproblem Q. person 2 believes that assuming Q were solved, then there would be many other proposals other than X that would also suffice to solve problem P, so that the core ideas of X actually aren't that important, and all the work is actually being done by assuming Q.
3Max H1y
I can see how this could be a frustrating pattern for both parties, but I think it's often an important conversation tree to explore when person 1 (or anyone) is using results about P in restricted domains to make larger claims or arguments about something that depends on solving P at the hardest difficulty setting in the least convenient possible world. As an example, consider the following three posts: * Challenge: construct a Gradient Hacker * Gradient hacking is extremely difficult * My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" I think both of the first two posts are valuable and important work on formulating and analyzing restricted subproblems. But I object to citation of the second post (in the third post) as evidence in support of a larger point that doom from mesa-optimizers or gradient descent is unlikely in the real world, and object to the second post to the degree that it is implicitly making this claim. There's an asymmetry when person I is arguing for an optimistic view on AI x-risk and person 2 is arguing for a doomer-ish view, in the sense that person I has to address all counterarguments but person 2 only has to find one hole. But this asymmetry is unfortunately a fact about the problem domain and not the argument / discussion pattern between I and 2.
2Dagon10mo
I find myself in person 2's position fairly often, and it is INCREDIBLY frustrating for person 1 to claim they've "solved" P, when they're ignoring the actual hard part (or one of the hard parts).  And then they get MAD when I point out why their "solution" is ineffective.  Oh, wait, I'm also extremely annoyed when person 2 won't even take steps to CONSIDER my solution - maybe subproblem Q is actually easy, when the path to victory aside from that is clarified. In neither case can any progress be made without actually addressing how Q fits into P, and what is the actual detailed claim of improvement of X in the face of both Q and non-Q elements of P.   
2the gears to ascension1y
yeah, but that's because Q is easy if you solve P Very nicely described, this might benefit from becoming a top level post
1DPiepgrass1y
For example?
5leogao1y
here's a straw hypothetical example where I've exaggerated both 1 and 2; the details aren't exactly correct but the vibe is more important: 1: "Here's a super clever extension of debate that mitigates obfuscated arguments [etc], this should just solve alignment" 2: "Debate works if you can actually set the goals of the agents (i.e you've solved inner alignment), but otherwise you can get issues with the agents coordinating [etc]" 1: "Well the goals have to be inside the NN somewhere so we can probably just do something with interpretability or whatever" 2: "how are you going to do that? your scheme doesn't tackle inner alignment, which seems to contain almost all of the difficulty of alignment to me. the claim you just made is a separate claim from your main scheme, and the cleverness in your scheme is in a direction orthogonal to this claim" 1: "idk, also that's a fully general counterargument to any alignment scheme, you can always just say 'but what if inner misalignment'. I feel like you're not really engaging with the meat of my proposal, you've just found a thing you can say to be cynical and dismissive of any proposal" 2: "but I think most of the difficulty of alignment is in inner alignment, and schemes which kinda handwave it away are trying to some some problem which is not the actual problem we need to solve to not die from AGI. I agree your scheme would work if inner alignment weren't a problem." 1: "so you agree that in a pretty nontrivial number [let's say both 1&2 agree this is like 20% or something] of worlds my scheme does actually work- I mean how can you be that confident that inner alignment is that hard? in the world's where inner alignment turns out to be easy then my scheme will work." 2: "I'm not super confident, but if we assume that inner alignment is easy then I think many other simpler schemes will also work, so the cleverness that your proposal adds doesn't actually make a big difference."
1DPiepgrass10mo
So Q=inner alignment? Seems like person 2 not only pointed to inner alignment explicitly (so it can no longer be "some implicit assumption that you might not even notice you have"), but also said that it "seems to contain almost all of the difficulty of alignment to me". He's clearly identified inner alignment as a crux, rather than as something meant "to be cynical and dismissive". At that point, it would have been prudent of person 1 to shift his focus onto inner alignment and explain why he thinks it is not hard. Note that your post suddenly introduces "Y" without defining it. I think you meant "X".

One possible model of AI development is as follows: there exists some threshold beyond which capabilities are powerful enough to cause an x-risk, and such that we need alignment progress to be at the level needed to align that system before it comes into existence. I find it informative to think of this as a race where for capabilities the finish line is x-risk-capable AGI, and for alignment this is the ability to align x-risk-capable AGI. In this model, it is necessary but not sufficient for alignment for alignment to be ahead by the time it's at the finish line for good outcomes: if alignment doesn't make it there first, then we automatically lose, but even if it does, if alignment doesn't continue to improve proportional to capabilities, we might also fail at some later point. However, I think it's plausible we're not even on track for the necessary condition, so I'll focus on that within this post.

Given my distributions over how difficult AGI and alignment respectively are, and the amount of effort brought to bear on each of these problems, I think there's a worryingly large chance that we just won't have the alignment progress needed at the critical juncture.

I also think it's ... (read more)

one man's modus tollens is another man's modus ponens:

"making progress without empirical feedback loops is really hard, so we should get feedback loops where possible" "in some cases (i.e close to x-risk), building feedback loops is not possible, so we need to figure out how to make progress without empirical feedback loops. this is (part of) why alignment is hard"

4Raemon1y
Yeah something in this space seems like a central crux to me. I personally think (as a person generally in the MIRI-ish camp of "most attempts at empirical work are flawed/confused"), that it's not crazy to look at the situation and say "okay, but, theoretical progress seems even more flawed/confused, we just need to figure out some how of getting empirical feedback loops." I think there are some constraints on how the empirical work can possibly work. (I don't think I have a short thing I could write here, I have a vague hope of writing up a longer post on "what I think needs to be true, for empirical work to be helping rather than confusedly not-really-helping")
2the gears to ascension1y
you gain general logical facts from empirical work, which can aide providing a blurry image of the manifold that the precise theoretical work is trying to build an exact representation of

A common cycle:

  1. This model is too oversimplified! Reality is more complex than this model suggests, making it less useful in practice. We should really be taking these into account. [optional: include jabs at outgroup]
  2. This model is too complex! It takes into account a bunch of unimportant things, making it much harder to use in practice. We should use this simplified model instead. [optional: include jabs at outgroup]

Sometimes this even results in better models over time.

for something to be a good way of learning, the following criteria have to be met:

  • tight feedback loops
  • transfer of knowledge to your ultimate goal
  • sufficiently interesting that it doesn't feel like a grind

trying to do the thing you care about directly hits 2 but can fail 1 and 3. many things that you can study hit 1 but fail 2 and 3. and of course, many fun games hit 3 (and sometimes 1) but fail to hit 2.

2leogao4mo
corollary: for things with very long feedback loops, or where you aren't motivated by default, it can be faster for learning to do something that is actually not directly the thing you care about
2Viliam4mo
This is basically math (and computer science) education. On one hand, some parts are probably not very useful. On the other hand, some people expect that teachers will defend every single step along the way by explaining how specifically this tiny atom of knowledge improves the student's future life. No, I am not preparing a PowerPoint presentation on how knowing that addition is associative and commutative will make you rich one day.
2leogao4mo
funnily enough, my experience has been almost entirely from the other direction - almost everything I know is from working directly on things I care about, and very little is from study. one of the reasons behind this shortform was trying to untangle why people spend lots of time studying stuff and whether/when it makes sense for me to study vs simply to learn by doing
2Viliam4mo
I think it is good to use your goals as a general motivation for going approximately in some direction, but the opposite extreme of obsessing whether every single detail you learn contributes to the goal is premature optimization. It reminds me of companies where, before you are allowed to spend 1 hour doing something, the entire team first needs to spend 10 hours in various meetings to determine whether that 1 hour would be spent optimally. I would rather spend all that time doing things, even if some of them turn out to be ultimately useless. Sometimes it's not even obvious in advance which knowledge will turn out to be useful.

lifehack: buying 3 cheap pocket sized battery packs costs like $60 and basically eliminates the problem of running out of phone charge on the go. it's much easier to remember to charge them because you can instantaneously exchange your empty battery pack for a full one when you realize you need one, plugging the empty battery pack happens exactly when you swap for a fresh one, and even if you forget once or lose one you have some slack

the project of rationality is the project of becoming slightly more mesaoptimizery

Corollary to Others are wrong != I am right (https://www.lesswrong.com/posts/4QemtxDFaGXyGSrGD/other-people-are-wrong-vs-i-am-right): It is far easier to convince me that I'm wrong than to convince me that you're right.

3JBlack1y
Quite a large proportion of my 1:1 arguments start when I express some low expectation of the other person's argument being correct. This is almost always taken to mean that I believe that some opposing conclusion is correct. Usually I have to give up before being able to successfully communicate the distinction, let alone addressing the actual disagreement.

Some aspirational personal epistemic rules for keeping discussions as truth seeking as possible (not at all novel whatsoever, I'm sure there exist 5 posts on every single one of these points that are more eloquent)

  • If I am arguing for a position, I must be open to the possibility that my interlocutor may turn out to be correct. (This does not mean that I should expect to be correct exactly 50% of the time, but it does mean that if I feel like I'm never wrong in discussions then that's a warning sign: I'm either being epistemically unhealthy or I'm talking
... (read more)
4Vladimir_Nesov1y
I find it a helpful framing to instead allow things that feel obviously false to become more familiar, giving them the opportunity to develop a strong enough voice to explain how they are right. That is, the action is on the side of unfamiliar false things, clarifying their meaning and justification, rather than on the side of familiar true things, refuting their correctness. It's harder to break out of a familiar narrative from within.

current understanding of optimization

  • high curvature directions (hessian eigenvectors with high eigenvalue) want small lrs. low curvature directions want big lrs
  • if the lr in a direction is too small, it takes forever to converge. if the lr is too big, it diverges by oscillating with increasing amplitude
  • momentum helps because if your lr is too small, it makes you move a bit faster. if your lr is too big, it causes the oscillations to cancel out with themselves. this makes high curvature directions more ok with larger lrs and low curvature directions more ok
... (read more)
2RHollerith3mo
What does "the lr" mean in this context?
2leogao3mo
learning rate

adhd is a mechanism for seeking domains with tight feedback loops

Understanding how an abstraction works under the hood is useful because it gives you intuitions for when it's likely to leak and what to do in those cases.

takes on takeoff (or: Why Aren't The Models Mesaoptimizer-y Yet)

here are some reasons we might care about discontinuities:

  • alignment techniques that apply before the discontinuity may stop applying after / become much less effective
    • makes it harder to do alignment research before the discontinuity that transfers to after the discontinuity (because there is something qualitatively different after the jump)
    • second order effect: may result in false sense of security
  • there may be less/negative time between a warning shot and the End
    • harder to coordinate and slow do
... (read more)

The following things are not the same:

  • Schemes for taking multiple unaligned AIs and trying to build an aligned system out of the whole
    • I think this is just not possible.
  • Schemes for taking aligned but less powerful AIs and leveraging them to align a more powerful AI (possibly with amplification involved)
    • This breaks if there are cases where supervising is harder than generating, or if there is a discontinuity. I think it's plausible something like this could work but I'm not super convinced.

In the spirit of https://www.lesswrong.com/posts/fFY2HeC9i2Tx8FEnK/my-resentful-story-of-becoming-a-medical-miracle , some anecdotes about things I have tried, in the hopes that I can be someone else's "one guy on a message board. None of this is medical advice, etc.

  • No noticeable effects from vitamin D (both with and without K2), even though I used to live somewhere where the sun barely shines and also I never went outside, so I was almost certainly deficient.
  • I tried Selenium (200mg) twice and both times I felt like utter shit the next day.
  • Glycine (2g) for
... (read more)

hypothesis: the kind of reasoning that causes ML people to say "we have made no progress towards AGI whatsoever" is closely analogous to the kind of reasoning that makes alignment people say "we have made no progress towards hard alignment whatsoever"

ML people see stuff like GPT4 and correctly notice that it's in fact kind of dumb and bad at generalization in the same ways that ML always has been. they make an incorrect extrapolation, which is that AGI must therefore be 100 years away, rather than 10 years away

high p(doom) alignment people see current mode... (read more)

1quetzal_rainbow3mo
It's differential progress that matters in alignment. I.e., if you expected that we need additional year of alignment research after creating AGI, it still looks pretty doomed, even if you admit overall progress in field.
2leogao3mo
sure, but seems orthogonal to the thing i'm describing - the claim is that a lot of alignment work on current models has ~no bearing on progress towards aligning AGI.

One of the greatest tragedies of truth-seeking as a human is that the things we instinctively do when someone else is wrong are often the exact opposite of the thing that would actually convince the other person.

2niplav1y
Rightfully so! Read your piece back in 2021 and found it true & straightforward.

an interesting fact that I notice is that in domains where there are are a lot of objects in consideration, those objects have some structure so that they can be classified, and how often those objects occur follows a power law or something, there are two very different frames that get used to think about that domain:

  • a bucket of atomic, structureless objects with unique properties where facts about one object don't really generalize at all to any other object
  • a systematized, hierarchy or composition of properties or "periodic table" or full grid or objec
... (read more)

it is often claimed that merely passively absorbing information is not sufficient for learning, but rather some amount of intentional learning is needed. I think this is true in general. however, one interesting benefit of passively absorbing information is that you notice some concepts/terms/areas come up more often than others. this is useful because there's simply too much stuff out there to learn, and some knowledge is a lot more useful than other knowledge. noticing which kinds of things come up often is therefore useful for prioritization. I often notice that my motivational system really likes to use this heuristic for deciding how motivated to be while learning something.

1[anonymous]5mo
I think it might also depend on your goals. Like how fast you want to learn something. If you have less than ideal time, then maybe more structured learning is necessary. If you have more time then periods of structureless/passive learning could be beneficial.

retargetability might be the distinguishing factor between controllers and optimizers

31a3orn1y
as in, controllers are generally retargetable and optimizers aren't? or vice-versa would be interested in reasoning, either way

a claim I've been saying irl for a while but have never gotten around to writing up: current LLMs are benign not because of the language modelling objective, but because of the generalization properties of current NNs (or to be more precise, the lack thereof). with better generalization LLMs are dangerous too. we can also notice that RL policies are benign in the same ways, which should not be the case if the objective was the core reason. one thing that can go wrong with this assumption is thinking about LLMs that are both extremely good at generalizing ... (read more)

1Daniel Paleka1y
what is the "language models are benign because of the language modeling objective" take?
2leogao1y
basically the Simulators kind of take afaict

House rules for definitional disputes:

  • If it ever becomes a point of dispute in an object level discussion what a word means, you should either use a commonly accepted definition, or taboo the term if the participants think those definitions are bad for the context of the current discussion. (If the conversation participants are comfortable with it, the new term can occupy the same namespace as the old tabooed term (i.e going forward, we all agree that the definition of X is Y for the purposes of this conversation, and all other definitions no longer appl
... (read more)

A few axes along which to classify optimizers:

  • Competence: An optimizer is more competent if it achieves the objective more frequently on distribution
  • Capabilities Robustness: An optimizer is more capabilities robust if it can handle a broader range of OOD world states (and thus possible pertubations) competently.
  • Generality: An optimizer is more general if it can represent and achieve a broader range of different objectives
  • Real-world objectives: whether the optimizer is capable of having objectives about things in the real world.

Some observations: it feels l... (read more)

3leogao2y
Another generator-discriminator gap: telling whether an outcome is good (outcome->R) is much easier than coming up with plans to achieve good outcomes. Telling whether a plan is good (plan->R) is much harder, because you need a world model (plan->outcome) as well, but for very difficult tasks it still seems easier than just coming up with good plans off the bat. However, it feels like the world model is the hardest part here, not just because of embeddedness problems, but in general because knowing the consequences of your actions is really really hard. So it seems like for most consequentialist optimizers, the quality of the world model actually becomes the main thing that matters. This also suggests another dimension along which to classify our optimizers: the degree to which they care about consequences in the future (I want to say myopia but that term is already way too overloaded). This is relevant because the further in the future you care about, the more robust your world model has to be, as errors accumulate the more steps you roll the model out (or the more abstraction you do along the time axis). Very low confidence but maybe this suggests that mesaoptimizers probably won't care about things very far in the future because building a robust world model is hard and so perform worse on the training distribution, so SGD pushes for more myopic mesaobjectives? Though note, this kind of myopia is not quite the kind we need for models to avoid caring about the real world/coordinating with itself.

A thought pattern that I've noticed myself and others falling into sometimes: Sometimes I will make arguments about things from first principles that look something like "I don't see any way X can be true, it clearly follows from [premises] that X is definitely false", even though there are people who believe X is true. When this happens, it's almost always unproductive to continue to argue on first principles, but rather I should do one of: a) try to better understand the argument and find a more specific crux to disagree on or b) decide that this topic isn't worth investing more time in, register it as "not sure if X is true" in my mind, and move on.

4Dagon2y
For many such questions, "is X true" is the wrong question.  This is common when X isn't a testable proposition, it's a model or assertion of causal weight.  If you can't think of existence proofs that would confirm it, try to reframe as "under what conditions is X a useful model?".

there are policies which are successful because they describe a particular strategy to follow (non-mesaoptimizers), and policies that contain some strategy for discovering more strategies (mesaoptimizers). a way to view the relation this has to speed/complexity priors that doesn't depend on search in particular is that policies that work by discovering strategies tend to be simpler and more generic (they bake in very little domain knowledge/metis, and are applicable to a broader set of situations because they work by coming up with a strategy for the task ... (read more)

3leogao5mo
another observation is that a meta-strategy with the ability to figure out what strategy is good is kind of defined by the fact that it doesn't bake in specifics of dealing with a particular situation, but rather can adapt to a broad set of situations. there are also different degrees of meta-strategy-ness; some meta strategies will more quickly adapt to a broader set of situations. (there's probably some sort of NFLT kind of argument you can make but NFLTs in general don't really matter)
2leogao5mo
the ability to figure out strategies doesn't necessarily have to be all reasoning, it can also encompass the experimental skillset

random brainstorming about optimizeryness vs controller/lookuptableyness:

let's think of optimizers as things that reliably steer a broad set of initial states to some specific terminal state seems like there are two things we care about (at least):

  • retargetability: it should be possible to change the policy to achieve different terminal states (but this is an insufficiently strong condition, because LUTs also trivially meet this condition, because we can always just completely rewrite the LUT. maybe the actual condition we want is that the complexity of t
... (read more)

a tentative model of ambitious research projects

when you do a big research project, you have some amount of risk you can work with - maybe you're trying to do something incremental, so you can only tolerate a 10% chance of failure, or maybe you're trying to shoot for the moon and so you can accept a 90% chance of failure.

budgeting for risk is non negotiable because there are a lot of places where risk can creep in - and if there isn't, then you're not really doing research. most obviously, your direction might just be a dead end. but there are also other t... (read more)

https://arxiv.org/abs/2304.08612 : interesting paper with improvement on straight through estimator

https://arxiv.org/abs/2302.07011 : sharpness doesn't seem to correlate with generalization

the phenomenon of strange bedfellows is probably caused in no small part by outgroup vs fargroup dynamics

'And what ingenious maneuvers they all propose to me! It seems to them that when they have thought of two or three contingencies' (he remembered the general plan sent him from Petersburg) 'they have foreseen everything. But the contingencies are endless.'

We spend a lot of time on trying to figure out empirical evidence to distinguish hypotheses we have that make very similar predictions, but I think a potentially underrated first step is to make sure they actually fit the data we already have.

4Thomas Kwa6mo
Example?

Is the correlation between sleeping too long and bad health actually because sleeping too long is actually causally upstream of bad health effects, or only causally downstream of some common cause like illness?

1Portia1y
Afaik, both. Like a lot of shit things - they are caused by depression, and they cause depression, horrible reinforcing loop. While the effect of bad health on sleep is obvious, you can also see this work in reverse; e.g. temporary severe sleep restriction has an anti-depressive effect. Notable, though with not many useful clinical applications, as constant sleep deprivation is also really unhealthy.

GPT-2-xl unembedding matrix looks pretty close to full rank (plot is singular values)

Unsupervised learning can learn things humans can't supervise because there's structure in the world that you need deeper understanding to predict accurately. For example, to predict how characters in a story will behave, you have to have some kind of understanding in some sense of how those characters think, even if their thoughts are never explicitly visible.

Unfortunately, this understanding only has to be structured in a way that makes reading off the actual unsupervised targets (i.e next observation) easy.

An incentive structure for scalable trusted prediction market resolutions

We might want to make a trustable committee for resolving prediction markets. We might be worried that individual resolvers might build up reputation only to exit-scam, due to finite time horizons and non transferability of reputational capital. However, shareholders of a public company are more incentivized to preserve the value of the reputational capital. Based on this idea, we can set something up as follows:

  • Market creators pay a fee for the services of a resolution company
  • There i
... (read more)
0Dagon1y
It's amazing how many proposals for dealing with institutional distrust sound a lot like "make a new institution, with the same structure, but with better actors."  You lose me at "trustable committee", especially when you don't describe how THOSE humans are motivated by truth and beauty, rather than filthy lucre.  Adding more layers of committees doesn't help, unless you define a "final, un-appealable decision" that's sooner than the full shareholder vote.  
2leogao1y
the core of the proposal really boils down to "public companies have less incentive to cash in on reputation and exit scam than individuals". this proposal is explicitly not "the same structure but with better actors".

Levels of difficulty:

  1. Mathematically proven to be impossible (i.e perfect compression)
  2. Impossible under currently known laws of physics (i.e perpetual motion machines)
  3. A lot of people have thought very hard about it and cannot prove that it's impossible, but strongly suspect it is impossible (i.e solving NP problems in P)
  4. A lot of people have thought very hard about it, and have not succeeded, but we have no strong reason to expect it to be impossible (i.e AGI)
  5. There is a strong incentive for success, and the markets are very efficient, so that for partic
... (read more)

(random shower thoughts written with basically no editing)

Sometimes arguments have a beat that looks like "there is extreme position X, and opposing extreme position Y. what about a moderate 'Combination' position?" (I've noticed this in both my own and others' arguments)

I think there are sometimes some problems with this.

  • Usually almost nobody is on the most extreme ends of the spectrum. Nearly everyone falls into the "Combination" bucket technically, so in practice you have to draw the boundary between "combination enough" vs "not combination enough to
... (read more)
3leogao1y
related take: "things are more nuanced than they seem" is valuable only as the summary of a detailed exploration of the nuance that engages heavily with object level cruxes; the heavy lifting is done by the exploration, not the summary

Subjective Individualism

TL;DR: This is basically empty individualism except identity is disentangled from cooperation (accomplished via FDT), and each agent can have its own subjective views on what would count as continuity of identity and have preferences over that. I claim that:

  1. Continuity is a property of the subjective experience of each observer-moment (OM), not necessarily of any underlying causal or temporal relation. (i.e I believe at this moment that I am experiencing continuity, but this belief is a fact of my current OM only. Being a Boltzmann b
... (read more)

Imagine if aliens showed up at your doorstep and tried to explain to you that making as many paperclips as possible was the ultimate source of value in the universe. They show pictures of things that count as paperclips and things that don't count as paperclips. They show you the long rambling definition of what counts as a paperclip from Section 23(b)(iii) of the Declaration of Paperclippian Values. They show you pages and pages of philosophers waxing poetical about how paperclips are great because of their incredible aesthetic value. You would be like, "... (read more)

7Dagon2y
I think I'd be confused.  Do they care about more or better paperclips, or do they care about worship of paperclips by thinking beings?  Why would they care whether I say I would do anything for paperclips, when I'm not actually making paperclips (or disassembling myself to become paperclips)?
1leogao2y
I thought it would be obvious from context but the answers are "doesn't really matter, any of those examples work" and "because they will send everyone to the paperclip mines after ensuring there are no rebellious sentiments", respectively. I've edited it to be clearer.

random thoughts. no pretense that any of this is original or useful for anyone but me or even correct

  • It's ok to want the world to be better and to take actions to make that happen but unproductive to be frustrated about it or to complain that a plan which should work in a better world doesn't work in this world. To make the world the way you want it to be, you have to first understand how it is. This sounds obvious when stated abstractly but is surprisingly hard to adhere to in practice.
  • It would be really nice to have some evolved version of calibration
... (read more)
3leogao1y
self self improvement improvement: feeling guilty about not self improving enough and trying to fix your own ability to fix your own abilities
1leogao1y
* Lots of things have very counterintuitive or indirect values. If you don't take this into account and you make decisions based on maximizing value you might end up macnamara-ing yourself hard. * The stages of learning something: (1) "this is super overwhelming! I don't think I'll ever understand it. there are so many things I need to keep track of. just trying to wrap my mind around it makes me feel slightly queasy" (2) "hmm this seems to actually make some sense, I'm starting to get the hang of this" (3) "this is so simple and obviously true, I've always known it to be true, I can't believe anyone doesn't understand this" (you start noticing that your explanations of the thing become indistinguishable from the things you originally felt overwhelmed by) (4) "this new thing [that builds on top of the thing you just learned] is super overwhelming! I don't think I'll ever understand it" * The feeling of regret really sucks. This is a bad thing, because it creates an incentive to never reflect on things or realize your mistakes. This shows up as a quite painful aversion to reflecting on mistakes, doing a postmortem, and improving. I would like to somehow trick my brain into reframing things somehow. Maybe thinking of it as a strict improvement over the status quo of having done things wrong? Or maybe reminding myself that the regret will be even worse if I don't do anything because I'll regret not reflecting in addition

Thought pattern that I've noticed: I seem to have two sets of epistemic states at any time: one more stable set that more accurately reflects my "actual" beliefs that changes fairly slowly, and one set of "hypothesis" beliefs that changes rapidly. Usually when I think some direction is interesting, I alternate my hypothesis beliefs between assuming key claims are true or false and trying to convince myself either way, and if I succeed then I integrate it into my actual beliefs. In practice this might look like alternating between trying to prove something ... (read more)

2Dagon2y
I think this pattern is common among intellectuals, and I'm surprised it's causing confusion.  Are you labeling your exploratory beliefs and statements appropriately?  An "epistemic status" note for posts here goes a long way, and in private conversation I often say out loud "I'm exploring here, don't take it as what I fully believe" in conversations at work and with friends.
3leogao2y
I think I do a poor job of labelling my statements (at least, in conversation. usually I do a bit better in post format). Something something illusion of transparency. To be honest, I didn't even realize explicitly that I was doing this until fairly recent reflection on it.