RyanCarey — LessWrong

I agree with all of this! A related shortform here.

A Bear Case: My Predictions Regarding AI Progress

Is GPT4.5's ?10T parameters really a "small fraction" of the human brain's 80B neurons and 100T synapses?

Reasons for and against working on technical AI safety at a frontier AI lab

This covers pretty well the altruistic reasons for/against working on technical AI safety at a frontier lab. I think the main reason for working at a frontier lab, however, is not altruistic. It's that it offers more money and status than working elsewhere - so it would be nice to be clear-eyed about this.

To be clear, on balance, I think it's pretty reasonable to want to work at a frontier lab, even based on the altruistic considerations alone.

What seems harder to justify altruistically, however, is why so many of us work on, and fund the same kinds of safety work that is done at frontier AI labs outside of frontier labs. After all, many of the downsides are the same: low neglectedness, safetywashing, shortening timelines, and benefiting (via industry grant programs) from the success of AI labs. Granted, it's not impossible to get hired to a frontier lab later. But on balance, I'm not sure that the altruistic impact is so good. I do think, however, that it is a pretty good option on non-altruistic grounds, given the current abundance of funding.

LawrenceC's Shortform

RyanCarey2y1512

I don't mean this as a criticism - you can both be right - but this is extremely correlated to the updates made by the average Bay Area x-risk reduction-enjoyer over the past 5-10 years, to the extent that it almost could serve as a summary.

Causality: A Brief Introduction

RyanCarey2y70

It may be useful to know that if events all obey the Markov property (they are probability distributions, conditional on some set of causal parents), then the Reichenbach Common Cause Principle follows (by d-separation arguments) as a theorem. So any counterexamples to RCCP must violate the Markov property as well.

There's also a lot of interesting discussion here.

Discovering Agents

RyanCarey3yΩ460

The idea that "Agents are systems that would adapt their policy if their actions influenced the world in a different way." works well on mechanised CIDs whose variables are neatly divided into object-level and mechanism nodes: we simply check for a path from a utility function F_U to a policy Pi_D. But to apply this to a physical system, we would need a way to obtain such a partition those variables. Specifically, we need to know (1) what counts as a policy, and (2) whether any of its antecedents count as representations of "influence" on the world (and after all, antecedents A of the policy can only be 'representations' of the influence, because in the real world, the agent's actions cannot influence themselves by some D->A->Pi->D loop). Does a spinal reflex count as a policy? Does an ant's decision to fight come from a representation of a desire to save its queen? How accurate does its belief about the forthcoming battle have to be before this representation counts? I'm not sure the paper answers these questions formally, nor am I sure that it's even possible to do so. These questions don't seem to have objectively right or wrong answers.

So we don't really have any full procedure for "identifying agents". I do think we gain some conceptual clarity. But on my reading, this clear definition serves to crystallise how hard it is to identify agents, moreso than it shows practically how it can be done.

(NB. I read this paper months ago, so apologies if I've got any of the details wrong.)

Where to be an AI Safety Professor

RyanCarey3y20

Nice. I've previously argued similarly that if going for tenure, AIS researchers might places that are strong in departments other than their own, for inter-departmental collaboration. This would have similar implications to your thinking about recruiting students from other departments. But I also suggested we should favour capital cities, for policy input, and EA hubs, to enable external collaboration. But tenure may be somewhat less attractive for AIS academics, compared to usual, in that given our abundant funding, we might have reason to favour Top-5 postdocs over top-100 tenure.

RyanCarey's Shortform

RyanCarey3y30

Feature suggestion. Using highlighting for higher-res up/downvotes and (dis)agreevotes.

Sometimes you want to indicate what part of a comment you like or dislike, but can't be bothered writing a comment response. In such cases, it would be nice if you could highlight the portion of text that you like/dislike, and for LW to "remember" that highlighting and show it to other users. Concretely, when you click the like/dislike button, the website would remember what text you had highlighted within that comment. Then, if anyone ever wants to see that highlighting, they could hover their mouse over the number of likes, and LW would render the highlighting in that comment.

The benefit would be that readers can conveniently give more nuanced feedback, and writers can have a better understanding of how readers feel about their content. It would stop this nagging wrt "why was this downvoted", and hopefully reduce the extent to which people talk past each other when arguing.

Zoe Curzi's Experience with Leverage Research

RyanCarey3y70

Hi Orellanin,

In the early stages, I had in mind that the more info any individual anon-account revealed, the more easily one could infer what time they spent at Leverage, and therefore their identity. So while I don't know for certain, I would guess that I created anonymoose to disperse this info across two accounts.

When I commented on the Basic Facts post as anonymoose, It was not my intent to contrive a fake conversation between two entities with separate voices. I think this is pretty clear from anonymoose's comment, too - it's in the same bulleted and dry format that throwaway uses, so it's an immediate possibility that throwaway and anonymoose are one and the same. I don't know why I used anonymoose there. Maybe due to carelessness, or maybe because I lost access to throwaway. (I know that at one time, an update to the forum login interface did rob me of access to my anon-account, but not sure if this was when that happened).

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments