All of habryka's Comments + Replies

At least for the coming year, our expenses are pretty entangled between all the different projects in a way that makes differentially funding things hard. I do take preferences of our donors on how to focus our efforts into account, so donating and just telling us that you would prefer us to work more on one kind of thing vs. another will have some effect. 

My guess is you will mostly just have to average our impact across different areas and decide whether the whole portfolio is above your bar.

When we were looking for office space in Berkeley earlier this year we were seeing list price between $3.25-$2.75/month per square foot, or $780k-900k/year for 20,000 square feet. I'd expect with negotiation you could get somewhat better pricing than this implies, especially if committing to a longer time period.

Yep, if you commit for longer time periods you can definitely get better deals, and there are definitely other ways to save on office space costs. I didn't mean to imply this was the minimum you could rent office space for.

The $1.2M/yr estimate was... (read more)

With that in mind, I was surprised by the lack of information in this funding request. I feel mixed about this: high-status AIS orgs often (accurately) recognize that they don't really need to spend time justifying their funding requests, but I think this often harms community epistemics (e.g., by leading to situations where everyone is like "oh X org is great-- I totally support them" without actually knowing much about what work they're planning to do, what models they have, etc.)

Sorry about that! I've drafted like 3-4 different fundraising posts over th... (read more)

4Akash1d
Thanks for this detailed response; I found it quite helpful. I maintain my "yeah, they should probably get as much funding as they want" stance. I'm especially glad to see that Lightcone might be interested in helping people stay sane/grounded as many people charge into the policy space.  This seems quite reasonable to me. I think it might've been useful to include something short in the original post that made this clear. I know you said "also feel free to ask any questions in the comments"; in an ideal world, this would probably be enough, but I'm guessing this isn't enough given power/status dynamics.  For example, if ARC Evals released a post like this, I expect many people would experience friction that prevented them from asking (or even generating) questions that might (a) make ARC Evals look bad, (b) make the commenter seem dumb, or (c) potentially worsen the relationship between the commenter and ARC evals.  To Lightcone's credit, I think Lightcone has maintained a (stronger) reputation of being fairly open to objections (and not penalizing people for asking "dumb questions" or something like that), but the Desire Not to Upset High-status People or Desire Not to Look Dumb In Front of Your Peers By Asking Things You're Already Supposed to Know are strong.  I'm guessing that part of why I felt comfortable asking (and even going past the "yay, I like Lightcone and therefore I support this post" to the mental motion of "wait, am I actually satisfied with this post? What questions do I have") is that I've had a chance to interact in-person with the Lightcone team on many occasions, so I felt considerably less psychological friction than most. All things considered, perhaps an ideal version of the post would've said something short like "we understand we haven't given any details about what we're actually planning to do or how we'd use the funding. This is because Oli finds this stressful. But we actually really want you to ask questions, even "dumb question

Will much of that $3-6M go into renovating and managing the Rose Garden Inn, or to cover work that could have been covered by existing funding if the Inn wasn't purchased?

Thinking about the exact financing of the Inn is a bit messy, especially if we compare it to doing something like running the Lightcone Offices, because of stuff like property appreciation, rental income from people hosting events here, and the hard-to-quantify costs of tying up capital in real estate as opposed to more liquid assets like stocks.

If you assume something like 5% property ap... (read more)

8AdamGleave2d
I'm sympathetic to the high-level claim that owning property usually beats renting if you're committing for a long time period. But the comparison with WeWork seems odd: WeWork specializes in providing short-term, serviced office space and does so at a substantial premium to the more traditional long-term, unserviced commercial real estate contract. When we were looking for office space in Berkeley earlier this year we were seeing list price between $3.25-$3.75/month per square foot, or $780k-900k/year for 20,000 square feet. I'd expect with negotiation you could get somewhat better pricing than this implies, especially if committing to a longer time period. Of course, the extra outdoor space, mixed-use zoning and ability to highly customize the space may well offset this. But it starts depending a lot more on the details (e.g. how often is the outdoor space used; how much more productive are people in a customized space vs a traditional office) than it might first seem.
5rachelAF2d
Thank you for such a detailed and thorough answer! This resolves a lot of my confusion. Based on conversations around closing the wework Lightcone office, I had assumed that you didn't want to continue hosting office space, and so hadn't considered that counterfactual cost. But the Inn expenses you mention seem more reasonable if the alternative is continuing to rent wework space. The FTX context also makes a lot of sense. I was confused how the purchase fit into your current strategy and funding situation, but I understand that both of those were quite different a year or two ago. Given how much things have changed, do you have conditions under which you would decide to sell the space and focus on other projects? Or are you planning to hold onto it no matter what, and decide how best to use it to support your current strategy as that develops?
4Quinn2d
I'm extremely excited by and supportive of this comment! An especially important related area I think is "solving the deference problem" or cascades of a sinking bar in forecasting and threatmodeling that I've felt over the last couple years.

We are just wrapping up renovations so not much yet (though we are done very soon). This summer we are likely hosting a good chunk of the SERI MATS scholars, as well as providing space for various other retreats and events (like the Singular Value Learning Theory workshop and we are talking to Manifold about maybe running a 100+ person forecasting conference here). 

In-parallel we are also providing office space to a small number of people that I expect to slowly grow over time, trying to build a tight-knit community of people working to reduce existen... (read more)

Those names do seem like at least a bit of an update for me.

I really wish that having someone EA/AI-Alignment affiliated who has expressed some concern about x-risk was a reliable signal that a project will not end up primarily accelerationist, but alas, history has really hammered it in for me that that is not reliably true. 

Some stories that seem compatible with all the observations I am seeing: 

  • The x-risk concerned people are involved as a way to get power/resources/reputation so that they can leverage it better later on
  • The x-risk concerned pe
... (read more)

Agreed, the initial announcement read like AI safety washing and more political action is needed, hence the call to action to improve this.

But read the taskforce leader’s op-ed

  1. He signed the pause AI petition.
  2. He cites ARC’s GPT-4 evaluation and Lesswrong in his AI report which has a large section on safety.
  3. “[Anthropic] has invested substantially in alignment, with 42 per cent of its team working on that area in 2021. But ultimately it is locked in the same race. For that reason, I would support significant regulation by governments and a pr
... (read more)

I am confused why you are framing this in a positive way? The announcement seems to primarily be that the UK is investing $125M into scaling AI systems in order to join a global arms race to be among the first to gain access to very powerful AI systems.

The usage of "safety" in the article seems to have little to do with existential risk, and indeed seems mostly straightforward safety-washing. 

Like, I am genuinely open to this being a good development, and I think a lot of recent development around AI policy and the world's relationship to AI risk has been good, but I do really have trouble seeing how this announcement is a positive sign. 

6Hauke Hillebrandt4d
Ian Hogarth is leading the task force [https://news.sky.com/story/sunak-eyes-songkick-founder-to-chair-governments-ai-taskforce-12900458] who's on record saying that AGI could lead to “obsolescence or destruction of the human race” if there’s no regulation on the technology’s progress.  Matt Clifford is also advising the task force - on record having said the same thing [https://twitter.com/matthewclifford/status/1666685601389719552] and knows a lot about AI safety. He had Jess Whittlestone & Jack Clark [https://podcasts.apple.com/gb/podcast/jess-whittlestone-jack-clark-what-governments-should/id1547838601?i=1000533823252] on his podcast.  If mainstream AI safety is useful and doesn't increase capabilities, then the taskforce and the $125M seem valuable. If it improves capabilities, then it's a drop in the bucket in terms of overall investment going into AI.

If you get funding from other funds, it would be best if you update your application (you can edit your application any time before the evaluation period ends), or withdraw your application. We'll get notifications if you make edits and make sure to consider them. 

2jacquesthibs8d
Perfect, thanks!

Yep, just paying a person a salary works, though the person needs to do enough things that are somewhat legibly for the public benefit to justify their salary to the IRS.

In an environment where EA organizations don't seem to keep their promises about responding in promised timeframes and write things like "we get back to you" and then don't and seem to be willing to accept the negative mental health consequences that come along with that, is there a good reason why people should expect a different practice from this new process?

Yeah, it's a pretty fair criticism. I am quite confident we will keep the "responses around the start of August" mark, because that one is pretty inherently baked into our evaluation process. And I ... (read more)

You have a clause about China and India, but not about Russia. So, Russia is OK? (Among other things, in Russia, it is difficult to receive money from abroad: many banks are disconnected from SWIFT, some of the rest have stopped working with transactions from the US on their own initiative, and there is a chance that a Western bank will refuse to conduct a transaction to Russia. So the most reliable way is to have a trusted intermediary person with money in a bank account in Russia and a second bank account somewhere else.)

I think Russia is marginally more... (read more)

Lightspeed Grants is definitely meaningfully modeled as being a kind of spinoff of the SFF, and also as a way to create more competition between different funding distribution mechanisms for Jaan and other funders. 

This means for this round there are a lot of similarities on the backend, though I do expect the applicant experience to already be quite different. And then I expect much more heavy divergence in future rounds as we have more end-to-end ownership over the product, which allows us to make more changes (I've already made a lot of changes to ... (read more)

1Ian David Moss9d
Thank you for the explanation!

Yeah, I do think there are a bunch of benefits to doing things in Google Docs, though it is often quite useful to have more structured data on the evaluation side.

This increases re-usability and decreases stress, as it's easy to make updates later on so it's less of a worry that you ended up missing something crucial.

You can actually update your application any time. When you submit the application you get a link that allows you to edit your submission as well as see its current status in the evaluation process. Seems like we should sign-post this better.

Oh, I actually like the Lightspeed Grants logo more. 

It's the Lightcone Logo with a dollar sign in it!

4Zach Stein-Perlman10d
Ah, I am no longer confused!

Intros would be great! Now that we've launched I've been planning to reach out to more potential funders, and I think we will very likely get more good applications than we have funding for.

Feel free to send me a DM or send me an email at habryka@lightspeedgrants.org to coordinate.

It is kind of a logistical headache to handle withdrawn application after we figured out a funding allocation, though it's not that bad. 

If you do have a lot of uncertainty on whether you will actually want to go ahead with the project (or think it's somewhat conditional on funder enthusiasm), I think it's best to choose the "get a response within 2 weeks" option. That's I think also the best option if you are applying for multiple projects (in which case I would recommend filling out one application that gets processed in the 60-day window, and then some secondary applications that you might pivot to if you get funding within the 2 week window).

Might we perhaps refrain from dismissing it if we can't even remember what the prior proposals were?

I mean, I definitely remember! I could summarize them, I just don't have a link ready, since they were mostly in random comment threads. I might go through the effort of trying to search for things, but the problem is not one of remembering, but one of finding things in a see of 10 years of online discussion in which many different terms have been used to point to the relevant ideas.

The linked post argues that this has important safety implications. So point

... (read more)
3cdkg11d
My intention is not to criticize you in particular! Let me describe my own thought process with respect to the originality of work. If I get an academic paper to referee and I suspect that it's derivative, I treat it as my job to demonstrate this by locating a specific published work that has already proposed the same theory. If I can't do this, I don't criticize it for being derivative. The epistemic rationale for this is as follows: if the experts working in an area are not aware of a source that has already published the idea, then even if the idea has already been published somewhere obscure, it is useful for the epistemic community to have something new to cite in discussing it. And of course, if I've discussed the idea in private with my colleagues but the paper I am refereeing is the first discussion of the idea I have seen written down, my prior discussions do not show the idea isn't original — my personal discussions don't constitute part of the collective knowledge of the research community because I haven't shared them publicly. It's probably not very fruitful to continue speculating about whether Gwern read the linked paper. It does seem to me that your disagreement directly targets our thesis in the linked paper (which is productive), whereas the disagreement I quoted above took Simon to be making the rather different claim that GPTs (considered by themselves) are not architecturally similar to Gato.

With all due respect to Gwern, repeating claims that work has already been done and then refusing to substantiate them is an epistemic train wreck.

I don't think that's what's happening here, so I feel confused about this comment. I haven't seen Gwern 'refuse to substantiate them'. He indeed commented pretty extensively about the details of your comment. 

Shutdown-seekingness has definitely been discussed a bunch over the years. It seems to come up a lot in Tool-AI adjacent discussions as well as impact measures. I also don't have a great link here sadl... (read more)

6cdkg11d
I'm referring to this exchange: I find it odd that so many people on the forum feel certain that the proposal in the post has already been made, but none are able to produce any evidence that this is so. Might the present proposal perhaps be different in important respects from prior proposals? Might we perhaps refrain from dismissing it if we can't even remember what the prior proposals were? The interesting thing about language agent architectures is that they wrap a GPT in a folk-psychological agent architecture which stores beliefs and desires in natural language and recruits the GPT to interpret its environment and plan actions. The linked post argues that this has important safety implications. So pointing out that Gato is not so different from a GPT is missing the point is a way that, to my mind, is only really possible if one has not bothered to read the linked research. What is relevant is the architecture in which the GPT is embedded, not the GPT itself.

Reacts can also be downvoted, which results in them being hidden. This is to counter abuse in the same way as voting counters abuse via low-quality comments.

Cool, makes sense. Sounds like I remembered the upper bound for the algorithmic efficiency estimate. Thanks for correcting!

You just copy the link to the market, and if you paste it into an empty new paragraph it should automatically be replaced with an embed.

I think requiring a "common initialization + early training trajectory" is a pretty huge obstacle to knowledge sharing, and would de-facto make knowledge sharing among the vast majority of large language models infeasible. 

I do think stuff like stitching via cross-attention is kind of interesting, but it feels like a non-scalable way of knowledge sharing, unless I am misunderstanding how it works. I don't know much about Knowledge Distillation, so maybe that is actually something that would fit the "knowledge sharing is easy" description (my models he... (read more)

I think requiring a "common initialization + early training trajectory" is a pretty huge obstacle to knowledge sharing, and would de-facto make knowledge sharing among the vast majority of large language models infeasible.

Agreed. That part of my comment was aimed only at the claim about weight averaging only working for diffusion/image models, not about knowledge sharing more generally.

I do think stuff like stitching via cross-attention is kind of interesting, but it feels like a non-scalable way of knowledge sharing, unless I am misunderstanding how

... (read more)

Huh, maybe. My current guess is that things aren't really "compute bottlenecked". It's just the case that we now have profitable enough AI that we really want to have better compute. But if we didn't get cheaper compute, we would still see performance increase a lot as we find ways to improve compute-efficiency the same way we've been improving it a lot over the past 5-10 years, and that for any given period of time, the algorithmic progress is a bigger deal for increasing performance than the degree to which compute got cheaper in the same period.

51a3orn13d
This is true, but as a picture of a past, this is underselling compute by focusing on cost of compute rather than compute itself. I.e., in the period between 2012 and 2020: -- Algo efficiency improved 44x, if we use the OpenAI efficiency baseline for AlexNet [https://openai.com/research/ai-and-efficiency] -- Cost of compute improved by... less than 44x, let's say, if we use a reasonable guess based off Moore's law. So algo efficiency was more important than that cost per FLOP going down. -- But, using EpochAI's estimates for a 6 month [https://epochai.org/blog/compute-trends] doubling time, total compute per training run increased > 10,000x. So just looking at cost of compute is somewhat misleading. Cost per FLOP went down, but the amount spent went up from just dollars on a training run to tens of thousands of dollars on a training run.
8tailcalled14d
I'd say usually bottlenecks aren't absolute, but instead quantifiable and flexible based on costs, time, etc.? One could say that we've reached the threshold where we're bottlenecked on inference-compute, whereas previously talk of compute bottlenecks was about training-compute. This seems to matter for some FOOM scenarios since e.g. it limits the FOOM that can be achieved by self-duplicating. But the fact that AI companies are trying their hardest to scale up compute, and are also actively researching more compute-efficient algorithms, means IMO that the inference-compute bottleneck will be short-lived.

If we take this as the disagreement -- will AI progress come from a handful of big insights, or many small ones -- I think the world right looks a great deal more like Hanson's view than Yudkowsky's. In his interview with Lex Fridman, Sam Altman characterizes GPT-4 as improving on GPT-3 in a hundred little things rather than a few big things, and that's... by far... my impression of current ML progress. So when I interpret their disagreement in terms of the kind of work you need to do before attaining AGI, I tend to agree that Hanson is right.

This also fee... (read more)

7Eli Tyre14d
Here's a market for your claim. GPT-4 performance and compute efficiency from a simple architecture before 2026 [https://manifold.markets/EliTyre/gpt4-performance-and-compute-effici]  

I agree I'm confused here. But it's hard to come down to clear interpretations. I kinda think Hanson and Yudkowsky are also confused.

Like, here are some possible interpretations on this issue, and how I'd position Hanson and Yudkowsky on them based on my recollection and on vibes.

  1. Improvements in our ways of making AI will be incremental. (Hanson pro, Yudkowsky maaaybe con, and we need some way to operationalize "incremental", so probably just ambiguous)
  2. Improvements in our ways of making AI will be made by lots of different people distributed over space
... (read more)

But -- regardless of Yudkowsky's current position -- it still remains that you'd have been extremely surprised by the last decade's use of compute if you had believed him, and much less surprised if you had believed Hanson.

I think you are pointing towards something real here, but also, algorithmic progress is currently outpacing compute growth by quite a bit, at least according to the Epoch AI estimates I remember. I also expect algorithmic progress to increase in importance. 

I do think that some of the deep learning revolution turned out to be kind o... (read more)

6Jsevillamol12d
This is not right, at least in computer vision. They seem to be the same order of magnitude. Physical compute has growth at 0.6 OOM/year and physical compute requirements have decreased at 0.1 to 1.0 OOM/year, see a summary here [https://epochai.org/trends] or a in depth investigation here [https://epochai.org/blog/revisiting-algorithmic-progress] Another relevant quote

I do think that some of the deep learning revolution turned out to be kind of compute bottlenecked, but I don't believe this is currently that true anymore

I had kind of the exact opposite impression of compute bottlenecks (that deep learning was not meaingfully compute bottlenecked until very recently). OpenAI apparently has a bunch of products and probably also experiments that are literally just waiting for H100s to arrive. Probably this is mainly due to the massive demand for inference, but still, this seems like a kind actual hardware bottleneck that i... (read more)

An actual improvement to say, how Transformers work, would help with speech recognition, language modelling, image recognition, image segmentation, and so on and so forth. Improvements to AI-relevant hardware are a trillion-dollar business. Work compounds so easily on other work that many alignment-concerned people want to conduct all AI research in secret.

This section feels like it misunderstands what Yudkowsky is trying to say here, though I am not confident. I expected this point to not be about "what happens if you find an improvement to transformers i... (read more)

Moreover, granting neural networks, trading cognitive content has turned out to be not particularly hard. It does not require superintelligence to share representations between different neural networks; a language model can be adapted to handle visual data without enormous difficulty. Encodings from BERT or an ImageNet model can be applied to a variety of downstream tasks, and this is by now a standard element in toolkits and workflows. When you share architectures and training data, as for two differently fine-tuned diffusion models, you can get semantic

... (read more)

I would also call this one for Eliezer. I think we mostly just retrain AI systems without reusing anything. I think that's what you'd guess on Eliezer's model, and very surprising on Robin's model. The extent to which we throw things away is surprising even to a very simple common-sense observer.

I would have called "Human content is unimportant" for Robin---it seems like the existing ML systems that are driving current excitement (and are closest to being useful) lean extremely heavily on imitation of human experts and mostly don't make new knowledge thems... (read more)

In addition to what cfoster0 said, I'm kinda excited about the next ~2-3 years of cross LLM knowledge transfer, so this seems a differing prediction about the future, which is fun.

My model for why it hasn't happened already is in part just that most models know the same stuff, because they're trained on extremely similar enormous swathes of text, so there's no gain to be had by sticking them together. That would be why more effort goes into LLM / images / video glue than LLM / LLM glue.

But abstractly, a world where LLMs can meaningfully be connected to vi... (read more)

The part where you can average weights is unique to diffusion models, as far as I can tell, which makes sense because the 2-d structure of the images is very local, and so this establishes a strong preferred basis for the representations of different networks.

Exchanging knowledge between two language models currently seems approximately impossible? Like, you can train on the outputs, but I don't think there is really any way for two language models to learn from each other by exchanging any kind of cognitive content, or to improve the internal represen

... (read more)

This feels like it is not really understanding my point, though maybe best to move this to some higher-bandwidth medium if the point is that hard to get across. 

Giving it one last try: What I am saying is that I don't think "conventional notion of preferences" is a particularly well-defined concept, and neither are a lot of other concepts you are using in order to make your predictions here. What it means to care about the preferences of others is a thing with a lot of really messy details that tend to blow up in different ways when you think harder a... (read more)

3Vladimir_Nesov12d
Zeroth approximation of pseudokindness is strict nonintervention, reifying the patient-in-environment as a closed computation and letting it run indefinitely, with some allocation of compute. Interaction with the outside world creates vulnerability to external influence, but then again so does incautious closed computation [https://www.lesswrong.com/posts/fDk9hLDpjeT9gZH6h/membranes-is-better-terminology-than-boundaries-alone?commentId=ARqHmuASX2gRqz42k], as we currently observe with AI x-risk, which is not something beamed in from outer space. Formulation of the kinds of external influences that are appropriate for a particular patient-in-environment is exactly the topic of membranes/boundaries [https://www.lesswrong.com/tag/boundaries-membranes-technical], this task can be taken as the defining desideratum for the topic. Specifically, the question of which environments [https://www.lesswrong.com/posts/JYon9nenijSWpGezg/what-is-inside-an-agent-s-membrane-a-brief-abstract-model?commentId=ybLsSoMFCFH3a3EMr] can be put in contact with a particular membrane without corrupting it, hence why I think membranes are relevant to pseudokindness [https://www.lesswrong.com/posts/Htu55gzoiYHS6TREB/sentience-matters?commentId=ab94QHwzAaDmrvEXK]. Naturality of the membranes/boundaries abstraction is linked to naturality of the pseudokindness abstraction. In contrast, the language of preferences/optimization seems to be the wrong frame for formulating pseudokindness, it wants to discuss ways of intervening and influencing, of not leaving value on the table, rather than ways of offering acceptable options that avoid manipulation. It might be possible to translate pseudokindness back into the language of preferences, but this translation would induce a kind of deontological prior on preferences [https://www.lesswrong.com/posts/87EzRDAHkQJptLthE/but-why-would-the-ai-kill-us?commentId=KuQmgKwyTnquddJLF] that makes the more probable preferences look rather surprising/unnatural from a

I think some of the confusion here comes from my using "kind" to refer to "respecting the preferences of existing weak agents," I don't have a better handle but could have just used a made up word.

Yeah, sorry, I noticed the same thing a few minutes ago, that I was probably at least somewhat misled by the more standard meaning of kindness. 

Tabooing "kindness" I am saying something like: 

Yes, I don't think extrapolated current humans assign approximately any value to the exact preference of "respecting the preferences of existing weak agents" and I... (read more)

3TurnTrout4d
I feel pretty uncertain of what assumptions are hiding in your "optimize strongly against X" statements. Historically this just seems hard to tease out, and wouldn't be surprised if I were just totally misreading you here.   That said, your writing makes me wonder "where is the heavy optimization [over the value definitions] coming from? [https://www.lesswrong.com/posts/yTvBSFrXhZfL8vr5a/worst-case-thinking-in-ai-alignment]", since I think the preference-shards themselves are the things steering the optimization power. For example, the shards are not optimizing over themselves to find adversarial examples to themselves [https://www.lesswrong.com/posts/rauMEna2ddf26BqiE/alignment-allows-nonrobust-decision-influences-and-doesn-t]. Related statements: * I think that a realistic "respecting preferences of weak agents"-shard doesn't bid for plans which maximally activate the "respect preferences of weak agents" internal evaluation metric, or even do some tight bounded approximation thereof.  * A "respect weak preferences" shard might also guide the AI's value and ontology reformation process.  * A nice person isn't being maximally nice, nor do they wish to be; they are nicely being nice.  I do agree (insofar as I understand you enough to agree) that we should worry about some "strong optimization over the AI's concepts, later in AI developmental timeline." But I think different kinds of "heavy optimization" lead to different kinds of alignment concerns.
3ryan_greenblatt15d
When I try to interpret your points here, I come to the conclusion that you think humans, upon reflection, would cause human extinction (in favor of resources being used for something else). Or at least that many/most humans would, upon reflection, prefer resources to be used for purposes other than preserving human life (including not preserving human life in simulation). And this holds even if (some of) the existing humans 'want' to be preserved (at least according to a conventional notion of preferences). I think this empirical view seems pretty implausible. That said, I think it's quite plausible that upon reflection, I'd want to 'wink out' any existing copies of myself in favor of using resources better things. But this is partially because I personally (in my current state) would endorse such a thing: if my extrapolated volition thought it would be better to not exist (in favor of other resource usage), my current self would accept that. And, I think it currently seems unlikely that upon reflection, I'd want to end all human lives (in particular, I think I probably would want to keep humans alive who had preferences against non-existence). This applies regardless of trade; it's important to note this to avoid a 'perpetual motion machine' type argument. Beyond this, I think that most or many humans or aliens would, upon reflection, want to preserve currently existing humans or aliens who had a preference against non-existence. (Again, regardless of trade.) Additionally, I think it's quite plausible that most or many humans or aliens will enact various trades or precommitments prior to reflecting (which is probably ill-advised, but it will happen regardless). So current preferences which aren't stable under reflection might have a significant influence overall.
5the gears to ascension15d
I am quite confident that I do, and it tends to infuriate my friends who get cranky that I feel a moral obligation to respect the artistic intent of bacterial genomes: all bacteria should go vegan, yet survive, and eat food equivalent to their previous.

If the result of an optimization process will be predictably horrifying to the agents which are applying that optimization process to themselves, then they will simply not do so.

In other words: AIs which feel anything in the vicinity of kindness before applying cosmic amounts of optimization pressure to themselves will try to steer that optimization pressure towards something which is recognizably kind at the end.

And I don't think there's any good argument for why AIs will lack any scrap of kindness with very high confidence at the point where they're just

... (read more)

Meta: I feel pretty annoyed by the phenomenon of which this current conversation is an instance, because when people keep saying things that I strongly disagree with which will be taken as representing a movement that I'm associated with, the high-integrity (and possibly also strategically optimal) thing to do is to publicly repudiate those claims*, which seems like a bad outcome for everyone.

For what it's worth, I think you should just say that you disagree with it? I don't really understand why this would be a "bad outcome for everyone". Just list out th... (read more)

8Richard_Ngo16d
When I say "repudiate" I mean a combination of publicly disagreeing + distancing. I presume you agree that this is suboptimal for both of us, and my comment above is an attempt to find a trade that avoids this suboptimal outcome. Note that I'm fine to be in coalitions with people when I think their epistemologies have problems, as long as their strategies are not sensitively dependent on those problems. (E.g. presumably some of the signatories of the recent CAIS statement are theists, and I'm fine with that as long as they don't start making arguments that AI safety is important because of theism.) So my request is that you make your strategies less sensitively dependent on the parts of your epistemology that I have problems with (and I'm open to doing the same the other way around in exchange).

Humans might respect the preferences of weak agents right now, but if they thought about it for longer they'd pretty robustly just want to completely destroy the existing agents (including a hypothetical alien creator) and replace them with something better. No reason to honor that kind of arbitrary path dependence.

No, this doesn't feel accurate. What I am saying is more something like: 

The way humans think about the question of "preferences for weak agents" and "kindness" feels like the kind of thing that will come apart under extreme optimization, i... (read more)

8paulfchristiano16d
I think some of the confusion here comes from my using "kind" to refer to "respecting the preferences of existing weak agents," I don't have a better handle but could have just used a made up word. I don't quite understand your objection to my summary---it seems like you are saying that notions like "kindness" (that might currently lead you to respect the preferences of existing agents) will come apart and change in unpredictable ways as agents deliberate. The result is that smart minds will predictably stop respecting the preferences of existing agents, up to and including killing them all to replace them with something that more efficiently satisfies other values (including whatever kind of form "kindness" may end up taking, e.g. kindness towards all the possible minds who otherwise won't get to exist). I called this utilitarian optimization but it might have been more charitable to call it "impartial" optimization. Impartiality between the existing creatures and the not-yet-created creatures seems like one of the key characteristics of utilitarianism while being very rare in the broader world . It's also "utilitarian" in the sense that it's willing to spare nothing (or at least not 1/trillion) for the existing creatures, and this kind of maximizing stance is also one of the big defining features of utilitarianism. So I do still feel like "utilitarian" is an OK way at pointing at the basic difference between where you expect intelligent minds will end up vs how normal people think about concepts like being nice.

Might write a longer reply at some point, but the reason why I don't expect "kindness" in AIs (as you define it here) is that I don't expect "kindness" to be the kind of concept that is robust to cosmic levels of optimization pressure applied to it, and I expect will instead come apart when you apply various reflective principles and eliminate any status-quo bias, even if it exists in an AI mind (and I also think it is quite plausible that it is completely absent). 

Like, different versions of kindness might or might not put almost all of their conside... (read more)

If the result of an optimization process will be predictably horrifying to the agents which are applying that optimization process to themselves, then they will simply not do so.

In other words: AIs which feel anything in the vicinity of kindness before applying cosmic amounts of optimization pressure to themselves will try to steer that optimization pressure towards something which is recognizably kind at the end.

And I don't think there's any good argument for why AIs will lack any scrap of kindness with very high confidence at the point where they're just... (read more)

4Vladimir_Nesov16d
Relevant sense of kindness is towards things that happen to already exist, because they already exist. Not filling some fraction of the universe with expression-of-kindness, brought into existence de novo, that's a different thing.

Is this a fair summary?

Humans might respect the preferences of weak agents right now, but if they thought about it for longer they'd pretty robustly just want to completely destroy the existing agents (including a hypothetical alien creator) and replace them with something better. No reason to honor that kind of arbitrary path dependence.

If so, it seems like you wouldn't be making an argument about AI or aliens at all, but rather an empirical claim about what would happen if humans were to think for a long time (and become more the people we wished to be a... (read more)

Almost certainly related to that email controversy from a few months ago. My sense is people have told him (or he has himself decided) to take a step back from public engagement. 

I think I disagree with this, but it's not a totally crazy call, IMO.

1dr_s15d
Yeah, beyond that honestly I would worry that his politics in general might do even more to polarize the issue in an undesirable way. I think it's not necessarily a bad call in the current atmosphere.
3ROM15d
I think this explains his absence from this + the FLI letter.  He still seems to be doing public outreach though: see interview NY Times [https://archive.is/sfdQL], interview with RTE [https://www.rte.ie/news/upfront/2023/0306/1360566-should-we-be-concerned-about-the-rise-of-artificial-intelligence/], Big Think video [https://www.youtube.com/watch?v=1WcpN4ds0iY] and interview with Analytics India Magazine [https://analyticsindiamag.com/pain-in-the-ais-by-nick-bostrom/].  None of these interviews have discussed the email. 
1Vishrut Arya16d
Aha. Ugh, what an unfortunate sequence of events.
2Nathan Helm-Burger17d
Well, the main reason for not just using the LW dark mode for me has been that I like to have the same color of text and background unified across all my websites. But switching to dark mode in addition to my dark mode plugin did change the contrast ratio of the react symbols, so fixed my problem! So thanks for the suggestion.
4gwern18d
Or the GW themes and/or dark-modes.

This feels game-theoretically pretty bad to me, and not only abstractly, but I expect concretely that setting up this incentive will cause a bunch of people to attempt to go into capabilities (based on conversations I've had in the space). 

4NicholasKross15d
For this incentives-reason, I wish hardcore-technical-AI-alignment had a greater support-infrastructure for independent researchers and students. Otherwise, we're often gonna be torn between "learning/working for something to get a job" and "learning AI alignment background knowledge with our spare time/energy". Technical AI alignment is one of the few important fields that you can't quite major in, and whose closest-related jobs/majors make the problem worse. As much as agency is nice, plenty of (useful!) academics out there don't have the kind of agency/risk-taking-ability that technical alignment research currently demands as the price-of-entry. This will keep choking us off from talent. Many of the best ideas will come from sheltered absentminded types, and only the LTFF and a tiny number of other groups give (temporary) support to such people.
2James_Miller19d
Yes, important to get the incentives right.  You could set the salary for AI alignment slightly below that of the worker's market value. Also, I wonder about the relevant elasticity.  How many people have the capacity to get good enough at programming to be able to contribute to capacity research + would have the desire to game my labor hording system because they don't have really good employment options?

We had an earlier iteration of the design where each react was basically a dimension where it made sense to have positives and negatives, and it IMO constrained the space of reacts too much. 

The primary point of the anti-react system is as a corrective system that I expect to be used relatively rarely (but that I do think is important to exist). While I agree that some reactions have meaningful opposites that one might be tempted to express with an anti-react, the right thing to do IMO is to provide another react with the opposite meaning, so that you can see them both side-by-side.

1jam_brand21d
This seems right to me since e.g. if someone were to use anti-excitement to indicate "this is draining" there'd then be an issue of how someone else might see this and then wonder how best to express they think it's actually pretty neutral rather than draining (since, while excitement cancels out anti-excitement, indicating excitement itself wouldn't be truth-tracking in this case).

I think I changed my mind on this, FWIW, after playing around more in this thread. I think bottom-left is indeed better.

I don't think it means anything (it's also not a particularly accessible state for the UI to end up in, since you have to first react, and then anti-react). It is kind of a state that's hard to avoid, where maybe you anti-reacted to something after someone left a react, and then they withdraw it, in which case it seems bad to just throw away the information that you thought it was a bad react, and seems more appropriate to "apply" it whenever the next person were to leave the same react again. 

6Said Achmiz23d
It seems like an awkward bit of information-architecture design, though, doesn’t it? I mean, for some of the reactions, it does, actually, make sense to anti-react to them directly, “from scratch”, as it were. Anti-“Insightful” clearly means “not insightful”, anti-“Virtue of Scholarship” can mean “this should exhibit the virtue of scholarship but fails to do so”, anti-“Clear” and anti-“Hits the Mark” and anti-“Exciting” also all have fairly clear meanings even when not reacting to their regular (non-reversed) versions. Now, for one thing, that this is the case for some of the reacts but not others seems like it’s bound to lead to confusion and weirdness. For another thing, it seems like making it easy to directly anti-react with the reactions I list above… should be fairly easy to access via the UI, given that it’s clearly meaningful to do so. But this would also (as currently designed) make it easier to directly anti-react with “Wrong” or “Shrug” or whatever, which seems less than ideal. This seems to me to suggest that the conceptual design of the feature might need some work.

Currently the most-reacted to icons are on the very right of the list. This feels like it's the wrong way around. I want to notice the most-reacted icons first, which will still be on the left side.

2Measure22d
I would rather match Discord's timestamp sorting. Duplicates grouped with the original and new reacts added to the end.
3jimrandomh22d
Currently they aren't sorted at all (so the order is some arbitrary emergent property, which I haven't reverse-engineered but which might be "sort by least recently applied"). I agree that sorting by descending count makes sense and will change it to that.
2Raemon23d
It’s not obvious to me which side is more noticeable

Bug: I can't undo my vote in the react/anti-react voting widget. When I click on the upvote/downvote buttons it just reapplies the vote instead of undoing it.

Having some way to view whether I've already left a reaction on a post would be great. Currently it just shows a number, and then if I click on it, the number decreases if I've already left a reaction. Would be nice for the background to be some color if I left the relevant reaction (maybe green if I pro-reacted and red if I anti-reacted).

4Vaniver22d
(The I saw this reaction doesn't show up on the parent comment, because of the antireaction, which seems weird.)

Mod note: It felt fine to do this once or twice, but it's not an intended use-case of AI Alignment Forum membership to post to the AI Alignment Forum with content that you didn't write. 

I would have likely accepted this submission to the AI Alignment Forum anyways, so it seems best to just go via the usual submission channels. I don't want to set a precedent of weirdly confusing co-authorship for submission purposes. You can also ping me on Intercom in-advance if you want to get an ahead notice of whether the post fits on the AIAF, or want to make sure it goes live there immediately. 

2Dan H23d
I asked for permission via Intercom to post this series on March 29th. Later, I asked for permission to use the [Draft] indicator and said it was written by others. I got permission for both of these, but the same person didn't give permission for both of these requests. Apologies this was not consolidated into one big ask with lots of context. (Feel free to get rid of any undue karma.)
2[comment deleted]23d

I did sure expect more text for this question. Is there any specific reason for why it should have worked out better than it did? Seems like it was pretty great for a lot of it, but wasn't enough to fully close the gap between them and the rest of Europe. Is the question why they didn't fully close the gap?

2lc25d
I'm going to pull this and then repost with more data. While I am anticommunist and anti-authoritarian, most of the QoL indicators were very underwhelming given my priors and I'd like to know more about why.

Mod note: I removed Dan H as a co-author since it seems like that was more used as convenience for posting it to the AI Alignment Forum. Let me know if you want me to revert.

Huh, interesting. Seems good to get an HTML version then, since in my experience PDFs have a pretty sharp dropoff in readership. 

When I google the title of the paper literally the only hit is this LessWrong post. Do you know where the paper was posted and whether there exists an HTML version (or a LaTeX, or a Word, or a Google Doc version)?

2Zach Stein-Perlman1mo
It was posted at https://arxiv.org/abs/2305.07153 [https://arxiv.org/abs/2305.07153]. I'm not aware of versions other than the pdf.

If the difference between these papers is: we do activations, they do weights, then I think that warrants more conceptual and empirical comparisons.

Yeah, it's totally possible that, as I said, there is a specific other paper that is important to mention or where the existing comparison seems inaccurate. This seems quite different from a generic "please have more thorough related work sections" request like the one you make in the top-level comment (which my guess is was mostly based on your misreading of the post and thinking the related work section only spans two paragraphs). 

7Dan H1mo
Yes, I'll tend to write up comments quickly so that I don't feel as inclined to get in detailed back-and-forths and use up time, but here we are. When I wrote it, I thought there were only 2 things mentioned in the related works until Daniel pointed out the formatting choice, and when I skimmed the post I didn't easily see comparisons or discussion that I expected to see, hence I gestured at needing more detailed comparisons. After posting, I found a one-sentence comparison of the work I was looking for, so I edited to include that I found it, but it was oddly not emphasized. A more ideal comment would have been "It would be helpful to me if this work would more thoroughly compare to (apparently) very related works such as ..."

The level of comparison between the present paper and this paper seems about the same as I see in papers you have been a co-author in. 

E.g. in https://arxiv.org/pdf/2304.03279.pdf the Related Works section is basically just a list of papers, with maybe half a sentence describing their relation to the paper. This seems normal and fine, and I don't see even papers you are a co-author on doing something substantively different here (this is again separate from whether there are any important papers omitted from the list of related works, or whether any s... (read more)

3Dan H1mo
In many of my papers, there aren't fairly similar works (I strongly prefer to work in areas before they're popular), so there's a lower expectation for comparison depth, though breadth is always standard. In other works of mine, such as this paper [https://arxiv.org/pdf/1802.05300.pdf] on learning the the right thing in the presence of extremely bad supervision/extremely bad training objectives, we contrast with the two main related works for two paragraphs, and compare to these two methods for around half of the entire paper. The extent of an adequate comparison depends on the relatedness. I'm of course not saying every paper in the related works needs its own paragraph. If they're fairly similar approaches, usually there also needs to be empirical juxtapositions as well. If the difference between these papers is: we do activations, they do weights, then I think that warrants a more in-depth conceptual comparisons or, preferably, many empirical comparisons.
habryka1moΩ41110

I don't understand this comment. I did a quick count of related works that are mentioned in the "Related Works" section (and the footnotes of that section) and got around 10 works, so seems like this is meeting your pretty arbitrarily established bar, and there are also lots of footnotes and references to related work sprinkled all over the post, which seems like the better place to discuss related work anyways.

I am not familiar enough with the literature to know whether this post is omitting any crucial pieces of related work, but the relevant section of ... (read more)

8Dan H1mo
Background for people who understandably don't habitually read full empirical papers: Related Works sections in empirical papers tend to include many comparisons in a coherent place. This helps contextualize the work and helps busy readers quickly identify if this work is meaningfully novel relative to the literature. Related works must therefore also give a good account of the literature. This helps us more easily understand how much of an advance this is. I've seen a good number of papers steering with latent arithmetic in the past year, but I would be surprised if this is the first time many readers of AF/LW have seen it, which would make this paper seem especially novel. A good related works section would more accurately and quickly communicate how novel this is. I don't think this norm is gatekeeping nor pedantic; it becomes essential when the number of papers becomes high. The total number of cited papers throughout the paper is different from the number of papers in the related works. If a relevant paper is buried somewhere randomly in a paper and not contrasted with explicitly in the related works section, that is usually penalized in peer review.
Load More