6,264 posts were written in 2023.
662 of them got at least one positive Review Vote.
209 of them got at least one review, and a positive Review Vote total.
50 of them shall be displayed in the Best of LessWrong, Year 2023.
(Many of these ideas developed in conversation with Ryan Greenblatt)
In a shortform, I described some different levels of resources and buy-in for misalignment risk mitigations that might be present in AI labs:
...*The “safety case” regime.* Sometimes people talk about wanting to have approaches to safety such that if all AI developers followed these approaches, the overall level of risk posed by AI would be minimal. (These approaches are going to be more conservative than will probably be feasible in practice given the amount of competitive pressure, so I think it’s pretty likely that AI developers don’t actually hold themselves to these standards, but I agree with e.g. Anthropic that this level of caution is at least a useful hypothetical to consider.) This is the level of caution people
I don't think you should think of "poor info flows" as something that a company actively does, but rather as the default state of affairs for any fast-moving organization with 1000+ people. Such companies normally need to actively fight against poor info flows, resulting in not-maximally-terrible-but-still-bad info flows.
This is a case where I might be over indexing from experience at Google, but I'd currently bet that if you surveyed a representative set of Anthropic and OpenAI employees, more of them would mostly agree with that statement than mostly disagree with it.
Crossposted from my blog which many people are saying you should check out!
Imagine that you came across an injured deer on the road. She was in immense pain, perhaps having been mauled by a bear or seriously injured in some other way. Two things are obvious:
In such a case, it would be callous to say that the deer’s suffering doesn’t matter because it’s natural. Things can both be natural and bad—malaria certainly is. Crucially, I think in this case we’d see something deeply wrong with a person who thinks that it’s not their problem in any way, that helping the deer is of no value. Intuitively, we recognize that wild animals matter!
But...
Pollywogs (the larval form of frogs, after eggs, and before growing legs) are an example where huge numbers of them are produced, and many die before they ever grow into frogs, but from their perspective, they probably have many many minutes of happy growth, having been born into a time and place where quick growth is easy: watery and full of food
Consider an alien species which requires oxygen, but for whom it was scarce during evolution, and so they were selected to use it very slowly and seek it ruthlessly, and feel happy when they manage to find some. I...
Sleep consolidation/ "sleeping on it" is when you struggle w/ [learning a piano piece], sleep on it, and then you're suddenly much better at it the next day!
This has happened to me for piano, dance, math concepts, video games, & rock climbing, but it varies in effectiveness. Why? Is it:
My current guess is a mix of all four. But I'm unsure if you [practice piano] in the morning, you...
In case you have not heard, there have been some recent and not-so-recent killings allegedly by people who have participated in aspiring rationalist spaces.
I've put some links in a footnote for those who want the basic info.[1]
I believe many people are thinking about this and how to orient. People have questions like:
I hereby demarcate this thread as a place for people to ask and answer questions about this topic.
Three virtues for this thread are (1) Taking care of yourself, (2) Courage, and (3) Informing.
Airlines have standard advice to put your own oxygen mask on first, before helping others. The reasoning being that if...
From my experience, the rationality community in Vienna does not share any of the craziness in Bay Area that I read about, so yeah, it seems plausible that different communities will end up significantly different.
I think there is a strong founder effect... the new members will choose whether they join or not depending on how comfortable they feel among the existing members. Decisions like "we have these rules / we don't have any rules", "there are people responsible for organization and safety / everyone needs to take care of themselves" once established,...
Finetuning could be an avenue for transmitting latent knowledge between models.
As AI-generated text increasingly makes its way onto the Internet, it seems likely that we'll finetune AI on text generated by other AI. If this text contains opaque meaning - e.g. due to steganography or latent knowledge - then finetuning could be a way in which latent knowledge propagates between different models.
I haven't looked into this in detail, and I'm not actually sure how unique a situation this is. But, it updated me on "institutional changes to the US that might be quite bad[1]", and it seemed good if LessWrong folk did some sort of Orient Step on it.
(Please generally be cautious on LessWrong talking about politics. I am interested in people commenting here who have read the LessWrong Political Prerequisites sequence. I'll be deleting or at least unhesitatingly strong downvoting comments that seem to be doing unreflective partisan dunking)
((But, that's not meant to mean "don't talk about political actions." If this is as big a deal as it sounds, I want to be able to talk about "what to do do?". But I want that talking-about-it to...
There are three traders on this market; it means nothing at the moment. No need for virtue signalling to explain a result you might perceive as abnormal, it's just not formed yet.
After adding the mean subtraction, the numbers haven't changed too much actually -- but let me make sure I'm using the correct calculation. I'm gonna follow your and @Adam Karvonen's suggestion of using the SAE bench code and loading my clustering solution as an SAE (this code).
v3 (KMeans): Layer blocks.3.hook_resid_post, n_tokens=100000, n_clusters=4096, variance explained = 0.8469745517
v3 (KMeans): Layer blocks.3.hook_resid_post, n_tokens=100000, n_clusters=16384, variance explained = 0.8537513018
v3 (KMeans): Layer blocks.4.hook_resid_post, n_tokens=10
... Due to linguistic relativity, might it be possible to modify or create a system of communication in order to make its users more aware of its biases?
If so, do any projects to actually do this exist?
Could you please ask about the specific examples of the Esperanto words? (I speak Esperanto.)
I think a similar example would be the adjective "Russian" in English, which translates to Russian as two different words: "русский" (related to Russian ethnicity or language) or "российский" (related to Russia as a country, i.e. including the minorities who live there).
(That would be "rus-a" vs "rus-land-a / rus-i-a" in Esperanto.)
I noticed this in a video where a guy explained that "I am Rus-land-ian, not Rus-ethnic-ian", which could be expressed in English as "I...
Summary:
This post outlines how a view we call subjective naturalism[1] poses challenges to classical Savage-style decision theory. Subjective naturalism requires (i) richness (the ability to represent all propositions the agent can entertain, including self-referential ones) and (ii) austerity (excluding events the agent deems impossible). It is one way of making precise certain requirements of embedded agency. We then present the Jeffrey–Bolker (JB) framework, which better accommodates an agent’s self-model and avoids forcing her to consider things she takes to be impossible.[2]
A naturalistic perspective treats an agent as part of the physical world—just another system subject to the same laws. Among other constraints, we think this means:
I'm more worried about counterfactual mugging and transparent Newcomb. Am I right that you are saying "in first iteration of transparent Newcomb austere decision theory gets no more than 1000$ but then learns that if it modifies its decision theory into more UDT-like it will get more money in similar situations", turning it into something like son-of-CDT?