Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
This is a special post for quick takes by John_Maxwell. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

New to LessWrong?

22 comments, sorted by Click to highlight new comments since: Today at 1:45 AM

Progress Studies: Hair Loss Forums

I still have about 95% of my hair. But I figure it's best to be proactive. So over the past few days I've been reading a lot about how to prevent hair loss.

My goal here is to get a broad overview (i.e. I don't want to put in the time necessary to understand what a 5-alpha-reductase inhibitor actually is, beyond just "an antiandrogenic drug that helps with hair loss"). I want to identify safe, inexpensive treatments that have both research and anecdotal support.

In the hair loss world, the "Big 3" refers to 3 well-known treatments for hair loss: finasteride, minoxidil, and ketoconazole. These treatments all have problems. Some finasteride users report permanent loss of sexual function. If you go off minoxidil, you lose all the hair you gained, and some say it wrinkles their skin. Ketoconazole doesn't work very well.

To research treatments beyond the Big 3, I've been using various tools, including both Google Scholar and a "custom search engine" I created for digging up anecdotes from forums. Basically, take whatever query I'm interested in ("pumpkin seed oil" for instance), add this OR OR OR OR OR OR OR OR OR OR OR OR OR OR OR OR

and then search on Google.

Doing this repeatedly has left me feeling like a geologist who's excavated a narrow stratigraphic column of Internet history.

And my big takeaway is how much dumber people got collectively between the "old school phpBB forum" layer and the "subreddit" layer.

This is a caricature, but I don't think it would be totally ridiculous to summarize discussion on /r/tressless as:

  1. Complaining about Big 3 side effects
  2. Complaining that the state of the art in hair loss hasn't advanced in the past 10 years
  3. Putdowns for anyone who tries anything which isn't the Big 3

If I was conspiracy-minded, I would wonder if Big 3 manufacturers had paid shills who trolled online forums making fun of anyone who tries anything which isn't their product. It's just the opposite of the behavior you'd expect based on game theory: Someone who tries something new individually runs the risk of new side effects, or wasting their time and money, with some small chance of making a big discovery which benefits the collective. So a rational forum user's response to someone trying something new should be: "By all means, please be the guinea pig". And yet that seems uncommon.

Compared with reddit, discussion of nonstandard treatments on old school forums goes into greater depth--I stumbled across a thread on an obscure treatment which was over 1000 pages long. And the old school forums have a higher capacity for innovation... here is a website that an old school forum user made for a DIY formula he invented, "Zix", which a lot of forum users had success with. (The site has a page explaining why we should expect the existence of effective hair loss treatments that the FDA will never approve.) He also links to a forum friend who started building and selling custom laser helmets for hair regrowth. (That's another weird thing about online hair loss forums... Little discussion of laser hair regrowth, even though it's FDA approved, intuitively safe, and this review found it works better than finasteride or minoxidil.)

So what happened with the transition to reddit? Some hypotheses:

  • Generalized eternal September
  • Internet users have a shorter attention span nowadays
  • Upvoting/downvoting facilitates groupthink
  • reddit's "hot" algorithm discourages the production of deep content; the "bump"-driven discussion structure of old school forums allows for threads which are over 1000 pages long
  • Weaker community feel due to intermixing with the entire reddit userbase

I'm starting to wonder if we should set up a phpBB style AI safety discussion forum. I have hundreds of thousands of words of AI content in my personal notebook, only a small fraction of which I've published. Posting to LW seems to be a big psychological speed bump for me. And I'm told that discussion on the Alignment Forum represents a fairly narrow range of perspectives within the broader AI safety community, perhaps because of the "upvoting/downvoting facilitates groupthink" thing.

The advantage of upvoting/downvoting seems to be a sort of minimal quality control--there is less vulnerability to individual fools as described in this post. But I'm starting to wonder if some of the highs got eliminated along with the lows.

Anyway, please send me a message if an AI safety forum sounds interesting to you.

I've noticed a similar trend in a very different area. In various strategic games there has IMO been a major drop in quality of discussion and content thanks to the shift from "discuss strategy on old-style forums and blogs" to "discuss strategy on group chats (Skype/Discord/Slack) and Reddit".

The former was much better at creating "permanent information" that could easily be linked and referred to; the latter probably has a higher volume of messages sent, but information is much more ephemeral and tends to be lost if you weren't in the right place at the right time. It's a lot harder to refer to "that influential Discord conversation a few weeks ago" than it is to link to a forum thread!

I would be interested in seeing what happens if someone creates a phpBB style AI Safety forum. If it works better than what we have, we should probably just switch the existing stuff we have towards a more similar architecture (though of course, the right tradeoffs might depend on the number of users, so maybe the right choice is to have both architectures in parallel). 

Another point is that if LW and a hypothetical phpBB forum have different "cognitive styles", it could be valuable to keep both around for the sake of cognitive diversity.

I checked the obvious subreddit (r/hairloss), and it seems to do just about everything wrong. It's not just that the Hot algorithm is favoring ephemeral content over accumulation of knowledge; they also don't have an FAQ, or any information in the sidebar, or active users with good canned replies to paste, or anything like that. I also note that most of the phpBBs mentioned are using subforums, to give the experimenters a place to talk without a stream of newbie questions, etc., which the subreddit is also missing.

I think the phpBB era had lots of similarly-neglected forums, which (if they somehow got traffic) would have been similarly bad. I think the difference is that Reddit is propping up this forum with a continuous stream of users, where a similarly-neglected phpBB would have quickly fallen to zero traffic.

So... I think this may be a barriers-to-entry story, where the relevant barrier is not on the user side, but on the administrator side; most Reddit users can handle signing up for a phpBB just fine, but creating a phpBB implies a level of commitment that usually implies you'll set up some subforums, create an FAQ, and put nonzero effort into making it good.

/r/tressless is about 6 times as big FYI.

The way I'm currently thinking about it is that reddit was originally designed as a social news website, and you have tack on a bunch of extras if you want your subreddit to do knowledge-accumulation, but phpBB gets you that with much less effort. (Could be as simple as having a culture of "There's already a thread for that here, you should add your post to it.")

A friend and I went on a long drive recently and listened to this podcast with Andrew Critch on ARCHES. On the way back from our drive we spent some time brainstorming solutions to the problems he outlines. Here are some notes on the podcast + some notes on our brainstorming.

In a possibly inaccurate nutshell, Critch argues that what we think of as the "alignment problem" is most likely going to get solved because there are strong economic incentives to solve it. However, Critch is skeptical of forming a singleton--he says people tend to resist that kind of concentration of power, and it will be hard for an AI team that has this as their plan to recruit team members. Critch says there is really a taxonomy of alignment problems:

  • single-single, where we have a single operator aligning a single AI with their preferences
  • single-multi, where we have a single operator aligning multiple AIs with their preferences
  • multi-single, where we have multiple operators aligning a single AI with their preferences
  • multi-multi, where we have multiple operators aligning multiple AIs with their preferences

Critch says that although there are commercial incentives to solve the single-single alignment problem, there aren't commercial incentives to solve all of the others. He thinks the real alignment failures might look like the sort of diffusion of responsibility you see when navigating bureaucracy.

I'm a bit skeptical of this perspective. For one thing, I'm not convinced commercial incentives for single-single alignment will extrapolate well to exotic scenarios such as the "malign universal prior" problem--and if hard takeoff happens then these exotic scenarios might come quickly. For another thing, although I can see why advocating a singleton would be a turnoff to the AI researchers that Critch is pitching, I feel like the question of whether to create a singleton deserves more than the <60 seconds of thought that an AI researcher having a casual conversation with Critch likely puts into their first impression. If there are commercial incentives to solve single-single alignment but not other kinds, shouldn't we prefer that single-single is the only kind which ends up being load-bearing? Why can't we form an aligned singleton and then tell it to design a mechanism by which everyone can share their preferences and control what the singleton does (democracy but with better reviews)?

I guess a big issue is the plausibility of hard takeoff, because if hard takeoff is implausible, that makes it less likely that a singleton will form under any circumstances, and it also means that exotic safety problems aren't likely to crop up as quickly. If this is Critch's worldview then I could see why he is prioritizing the problems he is prioritizing.

Anyway my friend and I spent some time brainstorming about how to solve versions of the alignment problem besides single-single. Since we haven't actually read ARCHES or much relevant literature, it's likely that much of what comes below is clueless, but it might also have new insights due to being unconstrained by existing paradigms :P

One scenario which is kind of in between multi-single and multi-multi alignment is a scenario where everyone has an AI agent which negotiates with some kind of central server on their behalf. We could turn multi-single into this scenario by telling the single AI to run internal simulations of everyone's individual AI agent, or we could turn multi-multi into this scenario if we have enough cooperation/enforcement for different people to abide by the agreements that their AI agents make with one another on their behalf.

Most of the game theory we're familiar with deals with a fairly small space of agreements it is possible to make, but it occurred to us that in an ideal world, these super smart AIs would be doing a lot of creative thinking, trying to figure out a clever way for everyone's preferences to be satisfied simultaneously. Let's assume each robot agent has a perfect model of its operator's preferences (or can acquire a perfect model as needed by querying the operator). The central server queries the agents about how much utility their operator assigns to various scenarios, or whether they prefer Scenario A to Scenario B, or something like that. And the agents can respond either truthfully or deceptively ("data poisoning"), trying to navigate towards a final agreement which is as favorable as possible for their operator. Then the central server searches the space of possible agreements in a superintelligent way and tries to find an agreement that everyone likes. (You can also imagine a distributed version of this where there is no central server and individual robot agents try to come up with a proposal that everyone likes.)

How does this compare to the scenario I mentioned above, where an aligned AI designs a mechanism and collects preferences from humans directly without any robot agent as an intermediary? The advantage of robot agents is that if everyone gets a superintelligent agent, then it is harder for individuals to gain advantage through the use of secret robot agents, so the overall result ends up being more fair. However, it arguably makes the mechanism design problem harder: If it is humans who are answering preference queries rather than superintelligent robot agents, since humans have finite intelligence, it will be harder for them to predict the strategic results of responding in various ways to preference queries, so maybe they're better off just stating their true preferences to minimize downside risk. Additionally, an FAI is probably better at mechanism design than humans. But then again, if the mechanism design for discovering fair agreements between superintelligent robot agents fails, and a single agent manages to negotiate really well on behalf of its owner's preferences, then arguably you are back in the singleton scenario. So maybe the robot agents scenario has the singleton scenario as its worst case.

I said earlier that it will be harder for humans to predict the strategic results of responding in various ways to preference queries. But we might be able to get a similar result for supersmart AI agents by making use of secret random numbers during the negotiation process to create enough uncertainty where revealing true preferences becomes the optimal strategy. (For example, you could imagine two mechanisms, one of which incentivizes strategic deception in one direction, and the other incentivizes strategic deception in the other direction; if we collect preferences and then flip a coin regarding which mechanism to use, the best strategy might be to do no deception at all.)

Another situation to consider is one where we don't have as much cooperation/enforcement and individual operators are empowered to refuse to abide by any agreement--let's call this "declaring war". In this world, we might prefer to overweight the preferences of more powerful players, because if everyone is weighted equally regardless of power, then the powerful players might have an incentive to declare war and get more than their share. However it's unclear how to do power estimation in an impartial way. Also, such a setup incentivizes accumulation of power.

One idea which seems like it might be helpful on first blush would be to try to invent some way of verifiably implementing particular utility functions, so competing teams could know that a particular AI will take their utility function into account. However this could be abused as follows: In the same way the game of chicken incentivizes tearing out your steering wheel so the opponent has no choice but to swerve, Team Evil could verifiably implement a particular utility function in their AI such that their AI will declare war unless competing teams verifiably implement a utility function Team Evil specifies.

Anyway looking back it doesn't seem like what I've written actually does much for the "bureaucratic diffusion of responsibility" scenario. I'd be interested to know concretely how this might occur. Maybe what we need is a mechanism for incentivizing red teaming/finding things that no one is responsible for/acquiring responsibility for them?

Someone wanted to know about the outcome of my hair loss research so I thought I would quickly write up what I'm planning to try for the next year or so. No word on how well it works yet.

Most of the ideas are from this review:

I think this should be safer/less sketchy than the big 3 and fairly low cost, but plausibly less effective on expectation; let me know if you disagree.

How are things progressing?

I didn't end up sticking to this because of various life disruptions. I think it was a bit helpful but I'm planning to try something more intensive next time.

Did you end up trying the microneedling? I'm curious about that route.

Yes, I tried it. It gave me a headache but I would guess that's not common. Think it's probably a decent place to start.

Related to the discussion of weighted voting allegedly facilitating groupthink earlier

An interesting litmus test for groupthink might be: What has LW changed its collective mind about? By that I mean: the topic was discussed on LW, there was a particular position on the issue that was held by the majority of users, new evidence/arguments came in, and now there's a different position which is held by the majority of users. I'm a bit concerned that nothing comes to mind which meets these criteria? I'm not sure it has much to do with weighted voting because I can't think of anything from LW 1.0 either.

  • Replication Crisis definitely hit hard. Lots of stuff there. 
  • People's timelines have changed quite a bit. People used to plan for 50-60 years, now it's much more like 20-30 years. 
  • Bayesianism is much less the basis for stuff. I think this one is still propagating, but I think Embedded Agency had a big effect here, at least on me and a bunch of other people I know.
  • There were a lot of shifts on the spectrum "just do explicit reasoning for everything" to "figuring out how to interface with your System 1 sure seems really important". I think Eliezer was mostly ahead of the curve here, and early on in LessWrong's lifetime we kind of fell prey to following our own stereotypes.
  • A lot of EA related stuff. Like, there is now a lot of good analysis and thinking about how to maximize impact, and if you read old EA-adjacent discussions, they sure strike me as getting a ton of stuff wrong.
  • Spaced repetition. I think the pendulum on this swung somewhat too far, but I think people used to be like "yeah, spaced repetition is just really great and you should use it for everything" and these days the consensus is more like "use spaced repetition in a bunch of narrow contexts, but overall memorizing stuff isn't that great". I do actually think rationalists are currently underusing spaced repetition, but overall I feel like there was a large shift here. 
  • Nootropics. I feel like in the past many more people were like "you should take this whole stack of drugs to make you smarter". I see that advice a lot less, and would advise many fewer people to follow that advice, though not actually sure how much I reflectively endorse that.
  • A bunch of AI Alignment stuff in the space of "don't try to solve the AI Alignment problem directly, instead try to build stuff that doesn't really want to achieve goals in a coherent sense and use that to stabilize the situation". I think this was kind of similar to the S1 stuff, where Eliezer seemed ahead of the curve, but the community consensus was kind of behind. 

I feel like there was a mass community movement (not unanimous but substantial) from AGI-scenarios-that-Eliezer-has-in-mind to AGI-scenarios-that-Paul-has-in-mind, e.g. more belief in slow takeoff + multipolar + "What Failure Looks Like" and less belief in fast takeoff + decisive strategic advantage + recursive self-improvement + powerful agents coherently pursuing misaligned goals. This was mostly before my time, I could be misreading things, that's just my impression. :-)

Seems true. Notably, if I have my cynical hat on (and I think I probably do?) it depended on having Paul say a bunch of things about it, and Paul had previously also established himself as a local "thinker celebrity". 

If I have my somewhat less cynical hat on, I do honestly think our status gradients do a decent job of tracking "person who is actually good at figuring things out", such that "local thinker celebrity endorses a thing" is not just crazy, it's a somewhat reasonable filtering mechanism. But I do think the effect is real.

Priming? Though that does feel like a fairly week example.

In this reaction to Critch's podcast, I wrote about some reasons to think that a singleton would be preferable to a multipolar scenario. Here's another rather exotic argument.

[The dark forest theory] is explained very well near the end of the science fiction novel, The Dark Forest by Liu Cixin.


When two [interstellar] civilizations meet, they will want to know if the other is going to be friendly or hostile. One side might act friendly, but the other side won't know if they are just faking it to put them at ease while armies are built in secret. This is called chains of suspicion. You don't know for sure what the other side's intentions are. On Earth this is resolved through communication and diplomacy. But for civilizations in different solar systems, that's not possible due to the vast distances and time between message sent and received. Bottom line is, every civilization could be a threat and it's impossible to know for sure, therefore they must be destroyed to ensure your survival.

Source. (Emphasis mine.)

Secure second strike is the ability to retaliate with your own nuclear strike if someone hits you with nukes. Secure second strike underpins mutually assured destruction. If nuclear war had a "first mover advantage", where whoever launches nukes first wins because the country that is hit with nukes is unable to retaliate, that would be much worse for a game theory perspective, because there's an incentive to be the first mover and launch a nuclear war (especially if you think your opponent might do the same).

My understanding is that the invention of nuclear submarines was helpful for secure second strike. There is so much ocean for them to hide in that it's difficult to track and eliminate all of your opponent's nuclear submarines and ensure they won't be able to hit you back.

However, in Allan Dafoe's article AI Governance: Opportunity and Theory of Impact, he mentions that AI processing of undersea sensors could increase the risk of nuclear war (presumably because it makes it harder for nuclear submarines to hide).

Point being, we don't know what the game theory of a post-AGI world looks like. And we really don't know what interstellar game theory between different AGIs looks like. ("A colonized solar system is plausibly a place where predators can see most any civilized activities of any substantial magnitude, and get to them easily if not quickly."--source.) It might be that the best strategy is for multipolar AIs to unify into a singleton anyway.

Potential counterargument: Second-strike capabilities are still relevant in the interstellar setting. You could build a bunch of hidden ships in the oort cloud to ram the foe and do equal devastation if the other party does it first, deterring a first strike even with tensions and an absence of communication. Further, while the "ram with high-relativistic objects" idea works pretty well for preemptively ending a civilization confined to a handful of planets, AI's would be able to colonize a bunch of little asteroids and KBO's and comets in the oort cloud, and the higher level of dispersal would lead to preemptive total elimination being less viable.

That's possible, but I'm guessing that it's not hard for a superintelligent AI to suddenly swallow an entire system using something like gray goo.

Lately I've been examining the activities I do to relax and how they might be improved. If you haven't given much thought to this topic, Meaningful Rest is excellent background reading.

An interesting source of info for me has been lsusr's posts on cutting out junk media: 1, 2, 3. Although I find lsusr's posts inspiring, I'm not sure I want to pursue the same approach myself. lsusr says: "The harder a medium is to consume (or create, as applicable) the smarter it makes me." They responded to this by cutting all the easy-to-consume media out of their life.

But when I relax, I don't necessarily want to do something hard. I want to do something which rejuvenates me. (See "Meaningful Rest" post linked previously.)

lsusr's example is inspiring in that it seems they got themselves studying things like quantum field theory for fun in their spare time. But they also noted that "my productivity at work remains unchanged", and ended up abandoning the experiment 9 months in "due to multiple changes in my life circumstances". Personally, when I choose to work on something, I usually expect it to be at least 100x as good a use of my time as random productive-seeming stuff like studying quantum field theory. So given a choice, I'd often rather my breaks rejuvenate me a bit more per minute of relaxation, so I can put more time and effort into my 100x tasks, than have the break be slightly useful on its own.

To adopt a different frame... I'm a fan of the wanting/liking/approving framework from this post.

  • In some sense, +wanting breaks are easy to engage in because it doesn't require willpower to get yourself to do them. But +wanting breaks also tend to be compulsive, and that makes them less rejuvenating (example: arguing online).

  • My point above is that I should mostly ignore the +approving or -approving factor in terms of the break's non-rejuvenating, external effects.

  • It seems like the ideal break is +liking, and enough +wanting that it doesn't require willpower to get myself to do it, and once I get started I can disconnect for hours and be totally engrossed, but not so +wanting that I will be tempted to do it when I should be working or keep doing it late into the night. I think playing the game Civilization might actually meet these criteria for me? I'm not as hooked on it as I used to be, but I still find it easy to get engrossed for hours.

Interested to hear if anyone else wants to share their thinking around this or give examples of breaks which meet the above criteria.