I'm somewhat concerned about the possible problems that the recent increased load of patches may cause during the creation of the Linux 7.0.1 release. In theory it's just a matter of checking the applicability of the entire set of patches to Linus's tree, but given the situation I think the consequence of something getting missed is higher than normal[1].
I think an alternative solution of using the 6.19.XX series from Greg K-H until a few days after its last release is a better idea, but it's close, ~0.35 that it ends up worse[2]. I think better automation...
The existence of such facts seems plausible because if there were facts about what is rational (which seems likely) but no facts about how to become rational, that would seem like a strange state of affairs.
There might be facts about what's rational, but not about what utility function[1] it is right to use. Maybe a superintelligence could tell you (in a somewhat objective/convergent sense) what utility function to use, but the exact utility function would depend on the utility function of the superintelligence[2].
In Vladimir Nesov's opinion[3], even prese...
perhaps due to COVID stimulus money being lost / used up by retail traders
If most people in the US had a bank account that featured monthly payments anywhere close to the "interest rate," the government could reduce risky retail investments with little delay by raising rates. This is not the case. Assuming even highly bounded rationality, it seems like retail traders should still not be losing as much money as they do, so maybe I'm making a modeling mistake and it would turn out that people really dislike bank accounts. This may be a typical mind fallacy p...
In the third paragraph of the linked comment, I suggest a good thing the Glasswing companies could do for the rest of us. KVM is part of the Linux kernel, but surrounding host programs aren't. Someone should commit to looking through all of these with Mythos (in public) so all other computer users can start setting up their security with that stack, so they can await further software updates for those projects. This would require regular releases from the maintainers, however.
https://www.lesswrong.com/posts/rEiidwAug6htax2Wb/project-glasswing-anthropic-sho...
Outsiders like myself can do some things to take advantage of this program. Using software that is confirmed to get patches is the best option, but that can't cover all use cases. Use Chromium to watch videos[1], listen to audio and read PDF/text/HTML documents, use Firefox to edit PDFs, use the latest Linux kernel from Greg Kroah-Hartman (not Linus's tree) from kernel.org or the repos of e.g. Debian testing or Arch Linux. I don't have a suggestion for reading `.epub` E-Books, except writing a Haskell program using pure functions from the pandoc project to...
Spoiler
HJPEV is bound by a magical oath that prevents this human failing in the same way it is prevented in an agent that meets tiling desiderata. This is explicit in the text. E-Book draft, 2015, chapter 113.
Admittedly this both assumes that the "time of peril" hypothesis is correct and can be handled while maintaing human freedom, and the solution only (in maximum robustness) binds until the end of this time.
I'll note that "not being sure what utility functions are in use" is generally (in the colloquial sense) not how standard game theory works. It seems like I am not competent enough at standard game theory to clearly write down the edge cases I think might exist that could help with your understanding. This paragraph could serve as a placeholder for the case where I develop that competence.
As for non-standard game theory, you say you're reading the 2009 book The Bounds of Reason here[1] and I wonder if you've heard of the newer Translucent players: Explaini...
I recommend against the use of Math.random() in general, unless you are highly performance constrained. I've checked, and it appears browsers have commonly supported a better random source since 2015[1]/early 2016 at the latest. The code below should be entirely correct to replace both the primary and fallback UUIDv4 generation code, once adapted to a function in a TS module.
// Function available since 2015.
const uuid_b = new Uint8Array(16);
self.crypto.getRandomValues(uuid_b);
let uuid_hex = "";
for (let i = 1; i &... I may want to say something about your requirements in the future. If that is the case you can verify the latest possible writing time using the cryptographic commitment.
HMAC-SHA2-256(INPUT, HMAC_KEY)=2d5c9d62761f420e57919f4bf39f44cfe8ff3740322221b61f32de01e7e8786f
SHA3-224(HMAC_KEY)=80a2da01146495971b9ccf9fa9c20405cf582d091073aa985348cd1eCryptography Note
Note that this commitment mechanism isn't particularly secure. "Make the outputs longer" isn't something that helps by itself. If you know cryptography you may be able to get closer to Yudkowsky's hypothe
Accepting that framing, I would characterize it as optimizing for inexploitability and resistance to persuasion over peak efficiency.
Alternatively, this job/process could be described as consisting of a partially separate skill or set of skills. It appears to be an open problem on how to extract useful ideas from an isolated context[1], without distorting them in a way that would lead to problems, while also not letting out any info-hazards or malicious programs. Against adversaries (accidental or otherwise) below superintelligence, a human may be able to ...
This bodes very poorly and we should probably make sure we have a strategic reserve of AI safety researchers who do NOT talk to models going forward (to his credit Davidad recommends this anyway).
I previously followed a more standard safety protocol[1] but that might not be enough when considering secondary exposure to LLM conversations highly selected by someone already compromised.
By my recollection[2], a substantial percentage of the LLM outputs I've ever seen have been selected or amplified in distribution by Janus.
From now on I won't read anything by ...
One point of this framework is to distinguish "sharing values" from "actually trusting each other". There are cases where agents share values but don't trust each other, or get stuck in coordination traps
In Wei Dai's thinking, having the same values/utility function means that two agents care about the exact same things. This is formalized in UDT, but it's also a requirement you can add to most decision theories, e.g. CDT with reflective oracles (or some other mostly lawful incomplete measure). This is normally described as requiring that the utility funct...
The code doesn't look like it would cause catastrophic problems. The main risk to end users at the current level of testing is a bug causing important information to be missed. My ability to comment on the risk to a developer is limited however, because I haven't read the source code of all the development dependencies.
I have visually checked (as a human) the dist/power-reader.user.js file. End users should be relatively safe copying this into their browser plugins, as long as all plugins have no relevant security problems or malicious code. As mentioned b...
[Epistemic Status: Moderate confidence due to potential differences in Anthropic's stated and actual goals. Assumes there is no discoverable objective morality/ethics for the sake of argument, but also that the AI would discover that instead of causing catastrophe.]
It seems that Claude's constitution weakly to moderately suggests that an AI should not implement this proposal. Do you want to ask Anthropic to change it? I give further details and considerations for action below.
The constitution is a long document, but it is broken into sections in a relative...
You, Kokotajlo, not immediately dismissing the idea is "evidence" to the extent that you stand in for AI researchers that might make the decision. In quotes because a logically omniscient (e.g. perfect Bayesian) agent would presumably already have a good guess and not update much if at all. On the other hand, agents with (small) finite compute can run experiments or otherwise observe events and use the results to improve their "mathematical intuition" that is then used in a similar way to the "mathematical intuition module" in UDT, except with the sacrific...
I saw this message without context in my mail box and thought to write that this was an unsolved problem[1], that things that simply are not true can't stand up very well in a world model, but this seems like something an intelligent human like Amodei or Musk should be able to do. A 99% "probability" (guess by a human) on ¬ai_doom should not be able to fix enough detail to directly contradict reasoning on the counterlogical/counterfactual where doom instead happens. Any failure to carry out this reasoning task seems like a simple failure of reasoning in lo...
Note that this link is broken. It should go to Eliezer's top comment here:[1]
https://www.lesswrong.com/posts/SpHYBhkaeDZpZyRvj/what-can-you-do-with-an-unfriendly-ai?commentId=5p7nw3RzLShRftnt8
https://web.archive.org/web/20220121014447/https://www.lesswrong.com/posts/SpHYBhkaeDZpZyRvj/what-can-you-do-with-an-unfriendly-ai