Developmental Stages of GPTs

The outer optimizer is the more obvious thing: it's straightforward to say there's a big difference in dealing with a superhuman Oracle AI with only the goal of answering each question accurately, versus one whose goals are only slightly different from that in some way.

GPT generates text by repeatedly picking whatever word seems highest probability given all the words that came before. So if its notion of "highest probability" is almost, but not quite, answering every question accurately, I would expect a system which usually answers questions accurately but sometimes answers them inaccurately. That doesn't sound very scary?

Developmental Stages of GPTs

esp. since GPT-3's 0-shot learning looks like mesa-optimization

Could you provide more details on this?

Sometimes people will give GPT-3 a prompt with some examples of inputs along with the sorts of responses they'd like to see from GPT-3 in response to those inputs ("few-shot learning", right? I don't know what 0-shot learning you're referring to.) Is your claim that GPT-3 succeeds at this sort of task by doing something akin to training a model internally?

If that's what you're saying... That seems unlikely to me. GPT-3 is essentially a stack of 96 transformers right? So if it was doing something like gradient descent internally, how many consecutive iterations would it be capable of doing? It seems more likely to me that GPT-3 is simply able to learn sufficiently rich internal representations such that when the input/output examples are within its context window, it picks up their input/output structure and forms a sufficiently sophisticated conception of that structure that the word that scores highest according to next-word prediction is a word that comports with the structure.

96 transformers would appear to offer a very limited budget for any kind of serial computation, but there's a lot of parallel computation going on there, and there are non-gradient-descent optimization algorithms, genetic algorithms say, that can be parallelized. I guess the query matrix could be used to implement some kind of fitness function? It would be interesting to try some kind of layer-wise pretraining on transformer blocks and train them to compute steps in a parallelizable optimization algorithm (probably you'd want to pick a deterministic algorithm which is parallelizable instead of a stochastic algorithm like genetic algorithms). Then you could look at the resulting network and based on it, try to figure out what the telltale signs of a mesa-optimizer are (since this network is almost certainly implementing a mesa-optimizer).

Still, my impression is you need 1000+ generations to get interesting results with genetic algorithms, which seems like a lot of serial computation relative to GPT-3's budget...

John_Maxwell's Shortform

/r/tressless is about 6 times as big FYI.

The way I'm currently thinking about it is that reddit was originally designed as a social news website, and you have tack on a bunch of extras if you want your subreddit to do knowledge-accumulation, but phpBB gets you that with much less effort. (Could be as simple as having a culture of "There's already a thread for that here, you should add your post to it.")

John_Maxwell's Shortform

Another point is that if LW and a hypothetical phpBB forum have different "cognitive styles", it could be valuable to keep both around for the sake of cognitive diversity.

John_Maxwell's Shortform

Progress Studies: Hair Loss Forums

I still have about 95% of my hair. But I figure it's best to be proactive. So over the past few days I've been reading a lot about how to prevent hair loss.

My goal here is to get a broad overview (i.e. I don't want to put in the time necessary to understand what a 5-alpha-reductase inhibitor actually is, beyond just "an antiandrogenic drug that helps with hair loss"). I want to identify safe, inexpensive treatments that have both research and anecdotal support.

In the hair loss world, the "Big 3" refers to 3 well-known treatments for hair loss: finasteride, minoxidil, and ketoconazole. These treatments all have problems. Some finasteride users report permanent loss of sexual function. If you go off minoxidil, you lose all the hair you gained, and some say it wrinkles their skin. Ketoconazole doesn't work very well.

To research treatments beyond the Big 3, I've been using various tools, including both Google Scholar and a "custom search engine" I created for digging up anecdotes from forums. Basically, take whatever query I'm interested in ("pumpkin seed oil" for instance), add this OR OR OR OR OR OR OR OR OR OR OR OR OR OR OR OR

and then search on Google.

Doing this repeatedly has left me feeling like a geologist who's excavated a narrow stratigraphic column of Internet history.

And my big takeaway is how much dumber people got collectively between the "old school phpBB forum" layer and the "subreddit" layer.

This is a caricature, but I don't think it would be totally ridiculous to summarize discussion on /r/tressless as:

  1. Complaining about Big 3 side effects
  2. Complaining that the state of the art in hair loss hasn't advanced in the past 10 years
  3. Putdowns for anyone who tries anything which isn't the Big 3

If I was conspiracy-minded, I would wonder if Big 3 manufacturers had paid shills who trolled online forums making fun of anyone who tries anything which isn't their product. It's just the opposite of the behavior you'd expect based on game theory: Someone who tries something new individually runs the risk of new side effects, or wasting their time and money, with some small chance of making a big discovery which benefits the collective. So a rational forum user's response to someone trying something new should be: "By all means, please be the guinea pig". And yet that seems uncommon.

Compared with reddit, discussion of nonstandard treatments on old school forums goes into greater depth--I stumbled across a thread on an obscure treatment which was over 1000 pages long. And the old school forums have a higher capacity for innovation... here is a website that an old school forum user made for a DIY formula he invented, "Zix", which a lot of forum users had success with. (The site has a page explaining why we should expect the existence of effective hair loss treatments that the FDA will never approve.) He also links to a forum friend who started building and selling custom laser helmets for hair regrowth. (That's another weird thing about online hair loss forums... Little discussion of laser hair regrowth, even though it's FDA approved, intuitively safe, and this review found it works better than finasteride or minoxidil.)

So what happened with the transition to reddit? Some hypotheses:

  • Generalized eternal September
  • Internet users have a shorter attention span nowadays
  • Upvoting/downvoting facilitates groupthink
  • reddit's "hot" algorithm discourages the production of deep content; the "bump"-driven discussion structure of old school forums allows for threads which are over 1000 pages long
  • Weaker community feel due to intermixing with the entire reddit userbase

I'm starting to wonder if we should set up a phpBB style AI safety discussion forum. I have hundreds of thousands of words of AI content in my personal notebook, only a small fraction of which I've published. Posting to LW seems to be a big psychological speed bump for me. And I'm told that discussion on the Alignment Forum represents a fairly narrow range of perspectives within the broader AI safety community, perhaps because of the "upvoting/downvoting facilitates groupthink" thing.

The advantage of upvoting/downvoting seems to be a sort of minimal quality control--there is less vulnerability to individual fools as described in this post. But I'm starting to wonder if some of the highs got eliminated along with the lows.

Anyway, please send me a message if an AI safety forum sounds interesting to you.

The Box Spread Trick: Get rich slightly faster

Does anyone have thoughts on whether buying Treasury Inflation-Protected Securities (probably in the form of an ETF) on margin would be a good way to hedge against currency devaluation?

ricraz's Shortform

There's been a fair amount of discussion of that sort of thing here: There are also groups outside LW thinking about social technology such as RadicalxChange.

Imagine you took 5 separate LWers and asked them to create a unified consensus response to a given article. My guess is that they’d learn more through that collective effort, and produce a more useful response, than if they spent the same amount of time individually evaluating the article and posting their separate replies.

I'm not sure. If you put those 5 LWers together, I think there's a good chance that the highest status person speaks first and then the others anchor on what they say and then it effectively ends up being like a group project for school with the highest status person in charge. Some related links.

ricraz's Shortform
  1. All else equal, the harder something is, the less we should do it.

  2. My quick take is that writing lit reviews/textbooks is a comparative disadvantage of LW relative to the mainstream academic establishment.

In terms of producing reliable knowledge... if people actually care about whether something is true, they can always offer a cash prize for the best counterargument (which could of course constitute citation of academic research). The fact that people aren't doing this suggests to me that for most claims on LW, there isn't any (reasonably rich) person who cares deeply re: whether the claim is true. I'm a little wary of putting a lot of effort into supply if there is an absence of demand.

(I guess the counterargument is that accurate knowledge is a public good so an individual's willingness to pay doesn't get you complete picture of the value accurate knowledge brings. Maybe what we need is a way to crowdfund bounties for the best argument related to something.)

(I agree that LW authors would ideally engage more with each other and academic literature on the margin.)

Learning human preferences: black-box, white-box, and structured white-box access

Let's say I'm trying to describe a hockey game. Modularizing the preferences from other aspects of the team algorithm makes it much easier to describe what happens at the start of the second period, when the two teams switch sides.

The fact that humans find an abstraction useful is evidence that an AI will as well. The notion that agents have preferences helps us predict how people will change their plans for achieving their goals when they receive new information. Same for an AI.

Load More