Both writesr take it as common knowledge that the reasons to not take the virus are stupid and wrong, and that the job is to fix what’s wrong with these soldiers who are refusing.
"writers" and "not take the vaccine", no?
Really? The main claim is presented "in an outrageous way"?
I can imagine reading the post and being unconvinced by the evidence presented. In fact, that was my reaction (although I haven't watched the videos yet). But... being outraged?
Posts should not make large, unsupported claims, and criticism should not be hyperbolic. Here is what I have learned from your critique:
Comment #1000 on LessWrong :)
This arbitrary choice effectively unrolls the state graph into a tree with a constant branching factor (+ self-loops in the terminal states) and we get that the POWER of all the states is equal.
Not necessarily true - you're still considering the IID case.
I think using a well-chosen reward distribution is necessary, otherwise POWER depends on arbitrary choices in the design of the MDP's state graph. E.g. suppose the student in the above example writes about every action they take in a blog that no one reads, and we choose to include the content of the
LeCun claims too much. It's true that the case of animals like orangutans points to a class of cognitive architectures which seemingly don't prioritize power-seeking. It's true that this is some evidence against power-seeking behavior being common amongst relevant cognitive architectures. However, it doesn't show that instrumental subgoals are much weaker drives of behavior than hardwired objectives.
One reading of this "drives of behavior" claim is that it has to be tautological; by definition, instrumental subgoals are always in service of the (hardwired)... (read more)
Two clarifications. First, even in the existing version, POWER can be defined for any bounded reward function distribution - not just IID ones. Second, the power-seeking results no longer require IID. Most reward function distributions incentivize POWER-seeking, both in the formal sense, and in the qualitative "keeping options open" sense.
To address your main point, though, I think we'll need to get more concrete. Let's represent the situation with a state diagram.
go to college right away
Both you and Rohin... (read more)
Right. But what does this have to do with your “different concept” claim?
When proving theorems for my research, I often take time to consider the weakest conditions under which the desired result holds - even if it's just a relatively unimportant and narrow lemma. By understanding the weakest conditions, you isolate the load-bearing requirements for the phenomenon of interest. I find this helps me build better gears-level models of the mathematical object I'm studying. Furthermore, understanding the result in generality allows me to recognize analogies and cross-over opportunities in the future. Lastly, I just find this plain satisfying.
I think the draft tends to use the term power to point to an intuitive concept of power/influence (the thing that we expect a random agent to seek due to the instrumental convergence thesis). But I think the definition above (or at least the version in the cited paper) points to a different concept, because a random agent has a single objective (rather than an intrinsic goal of getting to a state that would be advantageous for many different objectives)
This is indeed a misunderstanding. My paper analyzes the single-objective setting; no intrinsic power-seeking drive is assumed.
Ok, so the main advice is: don't make a card for everything, just the important concepts. And those concepts can be found in "cheatsheets" and "course review notes", it seems — unfortunately, I don't have any of those things.
Why not use Google for notes from other schools?
I commented a portion of a copy of your power-seeking writeup.
I like the current doc a lot. I also feel that it seems to not consider some big formal hints and insights we've gotten from my work over the past two years.
Very recently, I was able to show the following strong result:
Some researchers have speculated that capable reinforcement learning (RL) agents are often incentivized to seek resources and power in pursuit of their objectives. While seeking power in order to optimize a misspecified objective, agents might be incentivized to beh
Definitely not too late to make cards. I've learned a great deal of basic chemistry in the last month or so, just studying for random 30-minute chunks and binging interesting-looking wikipedia articles. A month+ is plenty of time for spaced repetition to work its magic.
For math, I recommend Anki; for non-Latex-intensive subjects like biology, I recommend SuperMemo for fast card creation while you read and review material. Unfortunately, I have yet to write up my thoughts on the latter.
I like to look at the "cheat sheets" for courses and ensure I know how t... (read more)
The Pfizer phase 3 study's last endpoint is 7 days after the second shot. Does anyone know why the CDC recommends waiting 2 weeks for full protection? Are they just being the CDC again?
Frustratingly, the phase 3's don't report this number. But using some data included in the Pfizer phase 3, I was able to make this graph:
The image isn't loading for me on LW, although it does load if I right-click and select 'open in a new tab.'
This was run on davinci via the OpenAI API. First completion.
ML starts running factories, warehouses, shipping, and construction. ML assistants help write code and integrate ML into new domains. ML designers help build factories and the robots that go in them. ML finance systems invest in companies on the basis of complicated forecasts and (ML-generated) audits. Tons of new factories, warehouses, power plants, trucks and roads are being built. Things are happening quickly, investors have super strong FOMO, no one real
Apparently VMs are the way to go for pdf support on linux.
It's a spaced repetition system that focuses on incremental reading. It's like Anki, but instead of hosting flashcards separately from your reading, you extract text while reading documents and PDFs. You later refine extracts into ever-smaller chunks of knowledge, at which point you create the "flashcard" (usually 'clozes', demonstrated below).
In... (read more)
I don’t follow the last bit. If ghosts were real, the first-order news would be amazing: maybe humanity wouldn’t have truly lost the brain-information of any human, ever!
The all-or-nothing vaccine hypothesis is:
But maybe the vaccine is 100% effective against all outcomes! So long as it’s correctly transported and administered, that is. Except sometimes vaccines are left at high temperature for too long, the delicate proteins are damaged, and people receiving them are effectively not vaccinated. If this happens 5% of the time, then 95% of people are completely immune to Covid and 5% are identical to not be vaccinated. Whatever chance they had of getting severe Covid before, it’s the same now.
If all-or-nothing were true, you... (read more)
Nope, that seems roughly right. It is I who failed to propagate. Was a cached argument from before I'd looked at the data.I'll update the post shortly with this. Thanks for pointing it out.
Do you think such humans would have a high probability of working on TAI alignment, compared to working on actually making TAI?
I think you are indeed making a mistake by letting unsourced FB claims worry you, given the known proliferation of antivax-driven misinformation. There is an extremely low probability that you're first hearing about a real issue via some random, unsourced FB comment.
For more evidence, look to the overreactions to J&J / AZ adverse effects. Regulatory bodies are clearly willing to make a public fuss over even small probabilities of things going wrong.
Evolution requires some amount of mutation, which is occasionally beneficial to the species. Species that were too good at preventing mutations would be unable to adapt to changing environmental conditions, and thus die out.
We're aware of many species which evolved to extinction. I guess I'm looking for why there's no plausible "path" in genome-space between this arrangement and an arrangement which makes fatal errors happen less frequently. EG why wouldn't it be locally beneficial to the individual genes to code for more robustness against spontaneous abortions, or an argument that this just isn't possible for evolution to find (like wheels instead of legs, or machine guns instead of claws).
I feel confused wrt the genetic mutation hypothesis for the spontaneous abortion phenomenon. Wouldn't genes which stop the baby from being born, quickly exit the gene pool? Similarly for gamete formation processes which allow such mutations to arise?
I agree. I've put it in my SuperMemo and very much look forward to going through it. Thanks Peter & Owen!
(midco developed this separately from our project last term, so this is actually my first read)
I have a lot of small questions.
What is your formal definition of the IEU ui? What kinds of goals is it conditioning on (because IEU is what you compute after you view your type in a Bayesian game)?
Multi-agent "impact" seems like it should deal with the Shapley value. Do you have opinions on how this should fit in?
You note that your formalism has some EDT-like properties with respect to impact:
Well, in a sense, they do. The universes where player
I'm really excited about this project. I think that in general, there are many interesting convergence-related phenomena of cognition and rational action which seem wholly underexplored (see also instrumental convergence, convergent evolution, universality of features in the Circuits agenda (see also adversarial transferability), etc...).
My one note of unease is that an abstraction thermometer seems highly dual-use; if successful, this project could accelerate AI timelines. But that doesn't mean it isn't worth doing.
I still don't fully agree with OP but I do agree that I should weight this heuristic more.
Yeah, I think these are good points.
OK, if we're talking about central identity, then I very much wouldn't sign a contract giving away rights to my central identity. I interpreted the question to be about selling one's "immortal soul" (which supposedly goes to heaven if I'm good).
I think part of the lesson here is ‘don’t casually sell vaguely defined things that are generally understood to be some kind of big deal’
I guess I feel like this is a significant steelman and atypical of normal usage. In my ontology, that algorithm is closer to ‘mind.’
I agree that "soul" has more 'real' meaning than "florepti xor bobble." There's another point to consider, though, which is that many of us will privilege claims about souls with more credence than they realistically deserve, as an effect of having grown up in a certain kind of culture.
Out of all the possible metaphysical constructs which could 'exist', why believe that souls are particularly likely? Many people believing in souls is some small indirect evidence for them, but not an amount of evidence commensurate with the concept's prior improbability.
I think "Don't casually make contracts you don't intent to keep" is just pretty cruxy for me. This is a key piece of being a trustworthy person who can coordinate in complex, novel domains. There might be a price where there is worth it to do it as a joke, but $10 is way too low.
I agree that the contracts part was important, and I share this crux. I should have noted that. I did purposefully modify my hypothetical so that I wasn't becoming less trustworthy by signing my acquaintance's piece of paper.
This actually seems obviously wrong to me, if
My gut reaction is... okay, sure, maybe doing it ostentatiously is obnoxious, but these reasons against feel rather contrived.
(It's not at all a takedown to say "I disagree, your arguments feel contrived, bye!", but I figured I'd rather write a small comment than not engage at all)
If an acquaintance approached me on the street, asked me to sign a piece of paper that says "I, TurnTrout, give [acquaintance] ownership over my metaphysical soul" in exchange for $10 (and let's just ignore other updates I should make based on being approached with such a w... (read more)
Unless you're really desperate, it just seems like a bad idea to sign any kind of non-standard contract for $10. There's always a chance that you're misunderstanding the terms, or that the contract gets challenged at some point, or even that your signature on the contract is used as blackmail. Maybe you're trying to run for office or get a job at some point in the future, and the fact that you've sold your soul is used against you. The actual contract that Jacob references is long enough that even taking the time to read and understand it is worth signific... (read more)
I mean "soul" is clearly much closer to having a meaning than "florepti xor bobble". You can tell that an em is pretty similar to being a soul but hand sanitizer is not really. You know some properties that souls are supposed to have. There are various secular accounts of what a soul is that basically match the intuiton (e.g. your personality).
I actually started this essay thinking "eh, I don't think this matters too much", but by the end of it I was just like "yeah, this checks out."
Suppose instead that the acquaintance approached me with a piece of paper that says "I, TurnTrout, give [acquaintance] ownership over
where these people feel the need to express their objections even before reading the full paper itself
I'd very much like to flag that my comment isn't meant to judge the contributions of your full paper. My comment was primarily judging your abstract and why it made me feel weird/hesitant to read the paper. The abstract is short, but it is important to optimize so that your hard work gets the proper attention!
(I had about half an hour at the time; I read about 6 pages of your paper to make sure I wasn't totally off-base, and then spent the rest of the time... (read more)
I very much agree with Eliezer about the abstract making big claims. I haven't read the whole paper, so forgive any critiques which you address later on, but here are some of my objections.
I think you might be discussing corrigibility in the very narrow sense of "given a known environment and an agent with a known ontology, such that we can pick out a 'shutdown button pressed' event in the agent's world model, the agent will be indifferent to whether this button is pressed or not."
The discussion of the HPMOR epilogue in this recent April Fool's thread was essentially online improv, where no one could acknowledge that without ruining the pretense. Maybe I should do more improv in real life, because I enjoyed it!
I think it's pretty obvious.
Oh, another thing: I think it was pretty silly that Eliezer had Harry & co infer the existence of the AI alignment problem and then have Harry solve the inner alignment problem.
I only read the HPMOR epilogue because - let's be honest - HPMOR is what LessWrong is really for.
(HPMOR spoilers ahead)
It's not clear to me why we need this tag.
It seems to me that deliberation can expand the domain of the value function. If I don’t know of football per se, but I’ve played a sport before, then I can certainly imagine a new game and form opinions about it. so I’m not sure how large the minimal set of generator concepts is, or if that’s even well-defined.
For this to matter, our alignment researchers need to be at the cutting edge of AI capabilities, and they need to be positioned such that their work can actually be incorporated into AI systems as they are deployed.
If we become aware that a lab will likely deploy TAI soon, other informed actors will probably become aware as well. This implies that many people would be trying to influence and gain access to this lab. Therefore, we should already have AI alignment researchers in positions of power within the lab before this happens.
I took Raj up on this generous offer. I'll post updates in the next few weeks as to how SM is compared to Anki!
But perhaps a better way forward would be to define a new concept of "Useful power" or something like that, which equals your share of the total power in a zero-sum game.
I don’t see why useful power is particularly useful, since it’s taking a non-constant-sum quantity (outside of nash equilibria) and making it constant-sum, which seems misleading.
But I also don’t see a problem with the “better play -> less exploitability -> less total Power” reasoning. this feels like a situation where our naive intuitions about power are just wrong, and if you think about it more, the formal result reflects a meaningful phenomenon.
I somehow agree with both you and OP, and also I don't buy part of the lever analogy yet. It seems important that the levers not only look similar, but that they be close to each other, in order to expect users to reliably mess up. Similarly, strong tool AI will offer many, many affordances, and it isn't clear how ''close'' I should expect them to be in use-space. From the security mindset, that's sufficient cause for serious concern, but I'm still trying to shake out the expected value estimate for powerful tool AIs -- will they be thermonuclear-weapon-like (as in your post), or will mistakes generally look different?
before you have a chance to do something useful
That statement seems far too strong, at least if you aren’t just talking about a very narrow subset of AI safety research (part of MIRI’s agenda). at a glance, that website gauges a skillset associated with one flavor of proof-based mathematics. For proof-based AI safety work, i think that the more important and general skill is: can you make meaningful formal conjectures and then prove them?
Do you like football? Well “football” is a learned concept living inside your world-model. Learned concepts like that are the only kinds of things that it’s possible to “like”. You cannot like or dislike [nameless pattern in sensory input that you’ve never conceived of]. It’s possible that you would find this nameless pattern rewarding, were you to come across it. But you can’t like it, because it’s not currently part of your world-model. That also means: you can’t and won’t make a goal-oriented plan to induce that pattern.
This was a ‘click’ for me, thanks.
Thanks so much for your comment! I'm going to speak for myself here, and not for Jacob.
That being said, I'm a bit underwhelmed by this post. Not that I think the work is wrong, but it looks like it boils down to saying (with a clean formal shape) things that I personally find pretty obvious: playing better at a zero (or constant sum) games means that the other players have less margin to get what they want. I don't feel that either the formalization of power nor the theorem bring me any new insight, and so I have trouble getting interested. Maybe I'm just
Thanks for the detailed reply!
I want to go a bit deeper into the fine points, but my general reaction is "I wanted that in the post". You make a pretty good case for a way to come around at this definition that makes it particularly exciting. On the other hand, I don't think that stating a definition and proving a single theorem that has the "obvious" quality (whether or not it is actually obvious, mind you) is that convincing.
The best way to describe my interpretation is that I feel that you two went for the "scientific paper" style, but the current state... (read more)
Probably going to reply to the rest later (and midco can as well, of course), but regarding:
Coming back after reading more, do you use σ−i to mean "the strategy profile for every process except i"? That would make more sense of the formulas (since you fix ai, there's no reason to have a σi) but if it's the case, then this notation is horrible (no offense).By the way, indexing the other strategies by −i instead of, let's say j or k is quite unconventional and confusing.
Coming back after reading more, do you use σ−i to mean "the strategy profile for every process except i"? That would make more sense of the formulas (since you fix ai, there's no reason to have a σi) but if it's the case, then this notation is horrible (no offense).
By the way, indexing the other strategies by −i instead of, let's say j or k is quite unconventional and confusing.
Using "σ−i" to mean "the strategy ... (read more)