The main way you produce a treacherous turn is not by "finding the treacherous turn capabilities," it's by creating situations in which sub-human systems have the same kind of motive to engage in a treacherous turn that we think future superhuman systems might have.
When you say "motive" here, is it fair to reexpress that as: "that which determines by what method and in which directions capabilities are deployed to push the world"? If you mean something like that, then my worry here is that motives are a kind of relation involving capabilities, not somet...
Creating in vitro examples of problems analogous to the ones that will ultimately kill us, e.g. by showing agents engaging in treacherous turns due to reward hacking or exhibiting more and more of the core features of deceptive alignment.
A central version of this seems to straightforwardly advance capabilities. The strongest (ISTM) sort of analogy between a current system and a future lethal system would be that they use an overlapping set of generators of capabilities. Trying to find an agent that does a treacherous turn, for the same reasons as a f...
Also, "No one knows how to make AI systems that try to do what we'd want them to do."
I'm asking what reification is, period, and what it has to do with what's in reality (the thing that bites you regardless of what you think).
How do they explain why it feels like there are noumena? (Also by "feels like" I'd want to include empirical observations of nexusness.)
In those scenarios, does Half-Ass seem like more of a "thing"?
IDK, but I like the question.
I'd say that what does seem like a thing is [insertion f-sort] where the fraction f is a parameter. Then [insertion 1/2-sort] is like [this particular instance of me picking up my water bottle and taking a drink], and [insertion f-sort] is like [me picking up my water bottle and taking a drink, in general].
Unless there's something interesting about [insertion 1/2-sort] in particular, like for example if there's some phase transition at 1/2 or something. Then I'd e...
Yeah, VoI seems like a better place to defer. Another sort of general solution, which I find difficult but others might find workable, is to construct theories of other perspectives. That lets there be sort of unlimited space to defer: you can do something that looks like deferring, but is more precisely described as creating a bunch of inconsistent theories in your head, and deferring to people about what their theory is, rather than what's true. (I run into trouble because I'm not so willing to accept others's languages if I don't see how they're using words consistently.)
(See my response to the parent comment.)
What I mean is, suppose the deferred-to has some belief X. This X is a refined, theoretically consilient belief to some extent, and to some extent it isn't, but is instead pre-theoretic; intuitive, pragmatic, unreliable, and potentially inconsistent with other beliefs. What happens when the deferred-to takes practical, externally visible action, which is somehow related to X? Many of zer other beliefs will also play a role in that action, and many of those beliefs will be to a large extent pre-theoretical. Pre-theoreticalness is contagious, in action: theo...
But you should be clear about just what it is you're studying
The post discusses the thingness of things. Seven, for example, seems like a thing--an entity, an object. I naturally mentally relate to seven in many of the same ways that I naturally mentally relate to a table. So the question is, what if anything is in common between how we relate to each of those entities that seem like things?
Half-Ass is a reasonable example of a somewhat non-thing, according to the hypothesis in the post. It refers one fairly strongly to "half" and to "ass" "insertion sort", but "insertion sort" barely refers one to Half-Ass, and likewise "half".
Things are reified out of sensory experience of the world (though note that "sensory" is redundant here), and the world is the unified non-thing
Okay, but the tabley-looking stuff out there seems to conform more parsimoniously to a theory that posits an external table. I assume we agree on that, and then the question is, what's happening when we so posit?
You might be interested in Quine's collection of essays Theories and Things, where he defends some version of "things as space-time regions". I'm pretty skeptical of your version though; or at least, I'm interested in why 7 seems like an object.
if you define the central problem as something like building a system that you'd be happy for humanity to defer to forever.
[I at most skimmed the post, but] IMO this is a more ambitious goal than the IMO central problem. IMO the central problem (phrased with more assumptions than strictly necessary) is more like "building system that's gaining a bunch of understanding you don't already have, in whatever domains are necessary for achieving some impressive real-world task, without killing you". So I'd guess that's supposed to happen in step 1. It's debata...
Another reason: internals may not strongly indicate what an agent is ultimately trying to do. https://tsvibt.blogspot.com/2022/12/ultimate-ends-may-be-easily-hidable.html
1. Non-Distinguished Internal Code Variants
Maybe related: https://tsvibt.blogspot.com/2022/10/the-conceptual-doppleganger-problem.html
Oh I see, the haploid cells are, like, independently viable and divide and stuff.
Cool! Is it known how to sequence the haploid cell? Can you get a haploid cell to divide so you can feed it into PCR or something? (I'm a layperson w.r.t. biology.) I just recently had an idea about sequencing sperm by looking at their meiotic cousins and would be interested in talking in general about this topic; email at gmail, address tsvibtcontact. https://tsvibt.blogspot.com/2022/06/non-destructively-sequencing-gametes-by.html
I haven't looked really, seems worth someone doing. I think there's been a fair amount of experimentation, though maybe a lot of it is predictably worthless (e.g. by continuing to inflict central harms of normal schooling), I don't know. (This post is mainly aimed at adding detail to what some of the harms are, so that experiments can try to pull the rope sideways on supposed tradeoffs like permissiveness vs. strictness or autonomy vs. guidance.) I looked a little. Aside from Montessori (which would take work to distinguish things branded as Montessori vs...
[To respond to not the literal content of your comment, in case it's relevant: I think some teachers are intrinsically bad, some are intrinsically great, and many are unfortunately compelled or think they're compelled to try to solve an impossible problem and do the best they can. Blame just really shouldn't be the point, and if you're worried someone will blame someone based on a description, then you may have a dispute with the blamer, not the describer.]
criticism of schools unrealistic
Well, it's worth distinguishing (1) whether/what harms are being ...
Afterthoughts:
-- An attitude against pure symbolism is reflected in the Jewish prohibition against making a bracha levatala (= idle, null, purposeless). That's why Jews hold their hands up to the havdalah candle: not to "feel the warmth of Shabbat" or "use all five senses", but so that the candle is being actually used for its concrete function.
-- An example from Solstice of a "symbolic" ritual is the spreading-candle-lighting thing. I quite like the symbolism, but also, there's a hollowness; it's transparently symbolic, and on some level what's communicat...
I speculate (based on personal glimpses, not based on any stable thing I can point to) that there's many small sets of people (say of size 2-4) who could greatly increase their total output given some preconditions, unknown to me, that unlock a sort of hivemind. Some of the preconditions include various kinds of trust, of common knowledge of shared goals, and of person-specific interface skill (like speaking each other's languages, common knowledge of tactics for resolving ambiguity, etc.).
[ETA: which, if true, would be good to have already set up before crunch time.]
In modeling the behavior of the coolness-seekers, you put them in a less cool position.
It might be a good move in some contexts, but I feel resistant to taking on this picture, or recommending others take it on. It seems like making the same mistake. Focusing on the object level because you want to be [cool in that you focus on the object level], that does has the positive effect of focusing on the object level, but I think also can just as well have all the bad effects of trying to be in the Inner Ring. If there's something good about getting into the Inn...
I agree that the epistemic formulation is probably more broadly useful, e.g. for informed oversight. The decision theory problem is additionally compelling to me because of the apparent paradox of having a changing caring measure. I naively think of the caring measure as fixed, but this is apparently impossible because, well, you have to learn logical facts. (This leads to thoughts like "maybe EU maximization is just wrong; you don't maximize an approximation to your actual caring function".)
In case anyone shared my confusion:
The while loop where we ensure that eps is small enough so that
bound > bad1() + (next - this) * log((1 - p1) / (1 - p1 - eps))
is technically necessary to ensure that bad1() doesn't surpass bound, but it is immaterial in the limit. Solving
bound = bad1() + (next - this) * log((1 - p1) / (1 - p1 - eps))
gives
eps >= (1/3) (1 - e^{ -[bound - bad1()] / [next - this]] })
which, using the log(1+x) = x approximation, is about
(1/3) ([bound - bad1()] / [next - this] ).
Then Scott's comment gives the rest. I was worried about the
...Could you spell out the step
every iteration where mean(𝙴[𝚙𝚛𝚎𝚟:𝚝𝚑𝚒𝚜])≥2/5 will cause bound - bad1() to grow exponentially (by a factor of 11/10=1+(1/2)(−1+2/5𝚙𝟷))
a little more? I don't follow. (I think I follow the overall structure of the proof, and if I believed this step I would believe the proof.)
We have that eps is about (2/3)(1-exp([bad1() - bound]/(next-this))), or at least half that, but I don't see how to get a lower bound on the decrease of bad1() (as a fraction of bound-bad1() ).
(Upvoted, thanks.)
I think I disagree with the statement that "Getting direct work done." isn't a purpose LW can or should serve. The direct work would be "rationality research"---figuring out general effectiveness strategies. The sequences are the prime example in the realm of epistemic effectiveness, but there's lots of open questions in productivity, epistemology, motivation, etc.
This still incentivizes prisons to help along the death of prisoners that they predict are more likely then the prison-wide average to repeat-offend, in the same way average utilitarianism recommends killing everyone but the happiest person (so to speak).
I see. That could be right. I guess I'm thinking about this (this = what to teach/learn and in what order) from the perspective of assuming I get to dictate the whole curriculum. In which case analysis doesn't look that great, to me.
Ok that makes sense. I'm still curious about any specific benefits that you think studying analysis has, relative to other similarly deep areas of math, or whether you meant hard math in general.
(See reply there.)
Seems like it's precisely because of the complicated technical foundation that real analysis was recommended.
What I'm saying is, that's not a good reason. Even the math with simple foundations has surprising results with complicated proofs that require precise understanding. It's hard enough as it is, and I am claiming that analysis is too much of a filter. It would be better to start with the most conceptually minimal mathematics.
...Even great mathematicians ran into trouble playing fast and loose with the real numbers. It took them about two hundred y
Could you say more about why you think real analysis specifically is good for this kind of general skill? I have pretty serious doubts that analysis is the right way to go, and I'd (wildly) guess that there would be significant benefits from teaching/learning discrete mathematics in place of calculus. Combinatorics, probability, algorithms; even logic, topology, and algebra.
To my mind all of these things are better suited for learning the power of proof and the mathematical way of analyzing problems. I'm not totally sure why, but I think a big part of it i...
PSA: If you wear glasses, you might want to take a look behind the little nosepads. Some... stuff... can build up there. According to this unverified source it is oxidized copper from glasses frame + your sweat, and can be cleaned with an old toothbrush + toothpaste.
There are ten thousand wrong solutions and four good solutions. You don't get much info from being told a particular bad solution. The opposite of a bad solution is a bad solution.
Lol yeah ok. I was unsure because alexa says 9% of search traffic to LW is from "demetrius soupolos" and "traute soupolos" so maybe there was some big news story I didn't know about.
http://www.alexa.com/siteinfo/lesswrong.com
(The recent uptick is due to hpmor, I suppose?)
I'd say your first thought was right.
...She noticed half an hour later on, when Harry Potter seemed to sway a bit, and then hunch over, his hands going to cover up his forehead; it looked like he was prodding at his forehead scar. The thought made her slightly worried; everyone knew there was something going on with Harry Potter, and if Potter's scar was hurting him then it was possible that a sealed horror was about to burst out of his forehead and eat everyone. She dismissed that thought, though, and continued to explain Quidditch facts to the historicall
Hermione the Hi-Fi Heiress of Hufflepuff and Harry the Humanist
As a simple matter of fact, Voldemort is stronger than Harry in basically every way, other than Harry's (incomplete) training in rationality. If Voldemort were a good enough planner, there's no way he could lose; he is smarter, more powerful, and has more ancient lore than any other wizard. If Voldemort were also rational, and didn't fall prey to overconfidence bias / planning fallacy...
Well, you can be as rational as you like, but if you are human and your opponent is a superintelligent god with a horde of bloodthirsty nanobots, the invincible Elder Ligh...
A brief and terrible magic lashed out from the Defense Professor's wand, scouring the hole in the wall, scarring the huge chunk of metal that lay in the room's midst; as Harry had requested, saying that the method he'd used might identify him.
I'm kind of worried about this... all the real attempted solutions I've seen use partial transfiguration. But if we take "the antagonist is smart" seriously, and given the precedent for V remembering and connecting obscure things (e.g. the Resurrection Stone), we should assume V has protections ...
Didn't V see at least the results of a Partial Transfiguration in Azkaban (used to cut through the wall)? Doesn't seem like something V would just ignore or forget.
Since they are touching his skin, does he need his wand to cancel the Transfiguration?
Right, this is a stronger interpretation.
This is persuasive, but... why the heck would Voldemort go the trouble of breaking into Azkaban instead of grabbing Snape or something?
Oh right, good point. Likewise Nott.
So, like, is Snape in that crowd of Death Eaters, or what?
FYI, each sequence is (very roughly) 20,000 words.
(Presumably Parseltongue only prevents willful lies.)
Quirrell also claims (not in Parseltongue):
Occlumency cannot fool the Parselmouth curse as it can fool Veritaserum, and you may put that to the trial also.
It seems like what you can say in Parseltongue should only depend on the actual truth and on your mental state. What happens if I Confundus / Memory Charm someone into believing X? Can they say X in Parseltongue? If they can say it just because they believe it, then Parseltongue is not so hard to bypass; I just Confundus myself (or get someone t...
If Parseltongue depended only on the actual truth of the world, Voldemort would have won already, because you can then pull single bits of arbitrary information out of the aether one at a time.
So for example, say Alice runs this experiment:
Alice observes that A learns to hack B. Then she solves this as follows:
Alice observes that A doesn't hack B. The Bob looks at Alice's results and says,
"Cool. But this won't generalize to future lethal systems because it doe... (read more)