LESSWRONG
LW

157
Julian_R
36190
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
1Julian_R's Shortform
5y
0
No wikitag contributions to display.
Elizabeth's Shortform
Julian_R1mo10

I recently learned that the Starship Troopers movie started out like this. 

To quote Wikipedia

Development of Starship Troopers began in 1991 as Bug Hunt at Outpost 7, written by Neumeier. After recognizing similarities between Neumeier's script and Heinlein's book, producer Jon Davison suggested aligning the script more closely with the novel to garner greater interest from studio executives.

Reply
Reward is not the optimization target
Julian_R1mo10

I suspect the clearest way to think about this is to carefully distinguish between the RL “agent” as defined by a learned policy (a mapping from states to actions) and the RL algorithm used to train that policy.

The RL algorithm is designed to create an agent which maximises reward.

The “goal” of an RL policy may not always be clear, but using Dennett’s intentional stance we can define it as “the thing it makes sense/compresses observations to say the policy appears to be maximising”.

Then I understand this post to be saying “The goal of an RL policy is not necessarily the same as the goal of the RL algorithm used to train it.”


Is that right?

Reply
Extended Interview with Zhukeepa on Religion
Julian_R1y101

Thank you for recording and posting these, I feel like I learned a lot, both about how to have conversations and lots of little details like the restaurant thing as proto preference synthesizer and the trauma cancer analogy and the Muhammad story and the disendorsing all judgements/resentments thing.

Reply2
How to get nerds fascinated about mysterious chronic illness research?
Julian_R1y30

I wonder if, just like young people not thinking clearly about mortality, it's just something healthy people don't tend to think about, partly because it's depressing.

(I'm also someone who got a lot more interested in this kind of thing after my own health issues)

Reply
Am I going insane or is the quality of education at top universities shockingly low?
Julian_R2y136

re institutional incentives, I've heard that part of US News rankings are based on asking survey respondents to evaluate other universities by reputation. Professors elsewhere (can only, and do) evaluate other professors based on the quality of their research, not teaching.

I'm curious, did you check what the quality of teaching would be like at your university before you went? If not, why? If so, why did you pick it anyway?

Reply
Clarifying the palatability theory of obesity
Julian_R4y10

to clarify, I don't understand why positive CICO can increase your weight set point but negative CICO can't decrease it.

Reply
Clarifying the palatability theory of obesity
Julian_R4y10

Guyenet suspects that our brain's weight set point might never go down dramatically after living long enough in the modern world, even if we eventually stop eating palatable food altogether. If true, this would make his theory harder to test, and again, his theory would earn a penalty for being more unfalsifiable, but at the same time, we should be clear about what observations his theory strongly predicts, and rapid weight loss on unpalatable diets is just not one of them.

I don't understand how CICO can coexist with the idea of a weight set point. If the mechanism of gaining weight is CICO via overeating because food is so palatable, then it seems natural than on unpalatable food you would eat less, and thus I would expect rapid weight loss on unpalatable diets as a prediction of the theory.

Reply
Redwood Research’s current project
Julian_R4y10

I was confused by Buck's response here because I thought we were going for worst-case quality until I realised:

  1. The model will have low quality on those prompts almost by definition - that's the goal.
  2. Given that, we also want to have a generally useful model - for which the relevant distribution is 'all fanfiction', not "prompts that are especially likely to have a violent continuation".

In between those two cases is 'snippets that were completed injuriously in the original fanfic ... but could plausibly have non-violent completions', which seems like the interesting case to me.

I suppose one possibility is to construct a human-labelled dataset of specifically these cases to evaluate on.

Reply
15Reinforcement Learning Goal Misgeneralization: Can we guess what kind of goals are selected by default?
3y
2
1Julian_R's Shortform
5y
0