Posts

2y

7Consequentialism is in the Stars not Ourselves

192

1y

19

23Is "Strong Coherence" Anti-Natural?

1y

18Feature Request: Right Click to Copy LaTeX

25

1y

4

51Beren's "Deconfusing Direct vs Amortised Optimisation"

1y

32Is "Recursive Self-Improvement" Relevant in the Deep Learning Paradigm?

10

1y

21Orthogonality is Expensive

36

1y

3

38"Dangers of AI and the End of Human Civilization" Yudkowsky on Lex Fridman

1y

32

68Sparks of Artificial General Intelligence: Early experiments with GPT-4 | Microsoft Research

1y

23

39Contra "Strong Coherence"

1y

24

11Incentives and Selection: A Missing Frame From AI Threat Discussions?

1y

16

Wiki Contributions

Feature request

1y

Selection Theorems

2y

(-1)

Selection Theorems

2y

(+1263)

Comments

Uncertainty in all its flavours

DragonGod6moΩ120

i.e. if each forecaster has an first-order belief $f (w) \in B (S)$ , and $w \in B (S)$ is your second-order belief about which forecaster is correct, then $(w ⊳_{W S} f) \in B (S)$ should be your first-order belief about the election.

I think there might be a typo here. Did you instead mean to write: " $w \in B (W)$ " for the second order beliefs about the forecasters?

Order Matters for Deceptive Alignment

DragonGod1y51

The claim is that given the presence of differential adversarial examples, the optimisation process would adjust the parameters of the model such that it's optimisation target is the base goal.

DragonGod1y20

That was it, thanks!

DragonGod1y20

Probably sometime last year, I posted on Twitter something like: "agent values are defined on agent world models" (or similar) with a link to a LessWrong post (I think the author was John Wentworth).

I'm now looking for that LessWrong post.

My Twitter account is private and search is broken for private accounts, so I haven't been able to track down the tweet. If anyone has guesses for what the post I may have been referring to was, do please send it my way.

DragonGod1y90

Most of the catastrophic risk from AI still lies in superhuman agentic systems.

Current frontier systems are not that (and IMO not poised to become that in the very immediate future).

I think AI risk advocates should be clear that they're not saying GPT-5/Claude Next is an existential threat to humanity.

[Unless they actually believe that. But if they don't, I'm a bit concerned that their message is being rounded up to that, and when such systems don't reveal themselves to be catastrophically dangerous, it might erode their credibility.]

DragonGod1y60

Immigration is such a tight constraint for me.

My next career steps after I'm done with my TCS Masters are primarily bottlenecked by "what allows me to remain in the UK" and then "keeps me on track to contribute to technical AI safety research".

What I would like to do for the next 1 - 2 years ("independent research"/ "further upskilling to get into a top ML PhD program") is not all that viable a path given my visa constraints.

Above all, I want to avoid wasting N more years by taking a detour through software engineering again so I can get Visa sponsorship.

[I'm not conscientious enough to pursue AI safety research/ML upskilling while managing a full time job.]

Might just try and see if I can pursue a TCS PhD at my current university and do TCS research that I think would be valuable for theoretical AI safety research.

The main detriment of that is I'd have to spend N more years in <city> and I was really hoping to come down to London.

Advice very, very welcome.

[Not sure who to tag.]

Hedonic Loops and Taming RL

DragonGod1y20

Specifically, the experiments by Morrison and Berridge demonstrated that by intervening on the hypothalamic valuation circuits, it is possible to adjust policies zero-shot such that the animal has never experienced a previously repulsive stimulus as pleasurable.

I find this a bit confusing as worded, is something missing?

DragonGod1y20

Does anyone know a ChatGPT plugin for browsing documents/webpages that can read LaTeX?

The plugin I currently use (Link Reader) strips out the LaTeX in its payload, and so GPT-4 ends up hallucinating the LaTeX content of the pages I'm feeding it.

Ruby's Quick Takes

DragonGod1y20

How frequent are moderation actions? Is this discussion about saving moderator effort (by banning someone before you have to remove the rate-limited quantity of their bad posts), or something else? I really worry about "quality improvement by prior restraint" - both because low-value posts aren't that harmful, they get downvoted and ignored pretty easily, and because it can take YEARS of trial-and-error for someone to become a good participant in LW-style discussions, and I don't want to make it impossible for the true newbies (young people discovering this style for the first time) to try, fail, learn, try, fail, get frustrated, go away, come back, and be slightly-above-neutral for a bit before really hitting their stride.

I agree with Dagon here.

Six years ago after discovering HPMOR and reading part (most?) of the Sequences, I was a bad participant in old LW and rationalist subreddits.

I would probably have been quickly banned on current LW.

It really just takes a while for people new to LW like norms to adjust.

DragonGod1y90

I find noticing surprise more valuable than noticing confusion.

Hindsight bias and post hoc rationalisations make it easy for us to gloss over events that were apriori unexpected.