All of John_Maxwell's Comments + Replies

  • Power makes you dumb, stay humble.

  • Tell everyone in the organization that safety is their responsibility, everyone's views are important.

  • Try to be accessible and not intimidating, admit that you make mistakes.

  • Schedule regular chats with underlings so they don't have to take initiative to flag potential problems. (If you think such chats aren't a good use of your time, another idea is to contract someone outside of the organization to do periodic informal safety chats. Chapter 9 is about how organizational outsiders are uniquely well-positioned

... (read more)
-1Lone Pine2y
Sure is lovely how the rationalist community is living up to its rationality norms.

Fair point. I also haven't done much posting since adding the bounty to my profile. Was thinking it might attract the attention of people reading the archives, but maybe there just aren't many archive readers.

There is some observational evidence that coffee drinking increases lifespan. I think the proposed mechanism has to do with promoting autophagy. But it looks like decaf works too. (Decaf has a bit of caffeine.)

I think somewhere else I read that unfiltered coffee doesn't improve lifespan, so try to drink the filtered stuff?

In my experience caffeine dependence is not a big deal and might help my sleep cycle.

I'd love to see a link to the unfiltered result.  I'm not even sure what "unfiltered" means in this context - eating whole beans (honestly, chocolate-covered espresso beans are delicious, but I don't imagine many people using that as their primary caffeine intake)?    Espresso vs drip?  Something else?

Eliezer is a good example of someone who built a lot of status on the back of "breaking" others' unworkable alignment strategies. I found the AI Box experiments especially enlightening in my early days.

Fair enough.

My personal feeling is that poking holes in alignment strategies is easier than coming up with good ones, but I'm also aware that thinking that breaking is easy is probably committing some quantity of typical mind fallacy.

Yeah personally building feels more natural to me.

I agree a leaderboard would be great. I think it'd be cool to have a... (read more)

I appreciate the nudge here to put some of this into action. I hear alarm bells when thinking about formalizing a centralized location for AI safety proposals and information about how they break, but my rough intuition is that if there is a way these can be scrubbed of descriptions of capabilities which could be used irresponsibly to bootstrap AGI, then this is a net positive. At the very least, we should be scrambling to discuss safety controls for already public ML paradigms, in case any of these are just one key insight or a few teraflops away from being world-ending. I would like to hear from others about this topic, though; I'm very wary of being at fault for accelerating the doom of humanity.

I wrote a comment on your post with feedback.

I don't have anything prepared for red teaming at the moment -- I appreciate the offer though! Can I take advantage of it in the future? (Anyone who wants to give me critical feedback on my drafts should send me a personal message!)

4Tor Økland Barstad2y
Thanks for the feedback! And yes, do feel free to send me drafts in the future if you want me to look over them. I don't give guaranties regarding amount or speed of feedback, but it would be my intention to try to be helpful :)

I skimmed the post, here is some feedback (context):

  • I'm probably not the best person to red team this since some of my own alignment ideas are along similar lines. I'm also a bit on the optimistic side about alignment more generally -- it might be better to talk to a pessimist.

  • This sounds a bit like the idea of a "low-bandwidth oracle".

  • I think the biggest difficulty is the one you explicitly acknowledged -- boxing is hard.

  • But there are also problems around ensuring that bandwidth is actually limited. If you have a human check to see that the

... (read more)
1Tor Økland Barstad2y
I have a first draft ready for part 2 now: Will read it over more, but plan to post within maybe a few days. I have also made a few changes to part 1, and will probably make additional changes to part 1 over time. As you can see if you open the Google Doc, part 2 is not any shorter than part 1. You may or may not interpret that as an indication that I don't make effective use of feedback. Part 3, which I have not finished, is the part that will focus more on proofs. (Edit: It does not. But maybe there will be a future post that focuses on proofs as planned. It is however quite very relevant to the topic of proofs the way I think of things.) Any help from anyone in reading over would be appreciated, but at the same time it is not expected :)
5Tor Økland Barstad2y
Thanks, that's interesting. Hadn't seen that (insofar as I can remember). Definitely overlap there.   Same, and that's a good/crisp way to put it.    Will edit at some point so as to follow the first part of that suggestion. Thanks!   Some things in that bullet-list addresses stuff I left out to cut length, and stuff I though I would address in future parts of the series. Found also those parts of bullet-list helpful, but still this exemplifies dilemmas/tradeoffs regarding length. Will try to make more effort to look for things to make shorter based on your advice. And I should have read through this one more before publishing. 

Thanks for the reply!

As some background on my thinking here, last I checked there are a lot of people on the periphery of the alignment community who have some proposal or another they're working on, and they've generally found it really difficult to get quality critical feedback. (This is based on an email I remember reading from a community organizer a year or two ago saying "there is a desperate need for critical feedback".)

I'd put myself in this category as well -- I used to write a lot of posts and especially comments here on LW summarizing how I'd g... (read more)

I think you make good points generally about status motives and obstacles for breakers. As counterpoints, I would offer: * Eliezer is a good example of someone who built a lot of status on the back of "breaking" others' unworkable alignment strategies. I found the AI Box experiments especially enlightening in my early days. * There are lots of high-status breakers, and lots of independent status-rewarding communities around the security world. Some of these are whitehat/ethical, like leaderboards for various bug bounty programs, OWASP, etc. Some of them not so much so, like Blackhat/DEFCON in the early days, criminal enterprises, etc. Perhaps here is another opportunity to learn lessons from the security community about what makes a good reward system for the breaker mentality. My personal feeling is that poking holes in alignment strategies is easier than coming up with good ones, but I'm also aware that thinking that breaking is easy is probably committing some quantity of typical mind fallacy. Thinking about how things break, or how to break them intentionally, is probably a skill that needs a lot more training in alignment. Or at least we need away to attract skilled breakers to alignment problems. I find it to be a very natural fit to post bounties on various alignment proposals to attract breakers to them. Keep upping the bounty, and eventually you have a quite strong signal that a proposal might be workable. I notice your experience of offering a personal bounty does not support this, but I think there is a qualitative difference between a bounty leaderboard with public recognition and a large pipeline of value that can be harvested by a community of good breakers, and what may appear to be a one-off deal offered by a single individual with unclear ancillary status rewards. It may be viable to simply partner with existing crowdsourced bounty program providers (e.g. BugCrowd) to offer a new category of bounty. Traditionally, these services have focused o
2Tor Økland Barstad2y
Interesting comment. I feel like I recently have experienced this phenomena myself (that it's hard to find people who can play "red team"). Do you have any "blue team" ideas for alignment where you in particular would want someone to play "red team"? I would be interested in having someone play "red team" here, but if someone were to do so in a non-trivial manner then it would probably be best to wait at least until I've completed Part 3 (which will take at least weeks, partly since I'm busy with my main job): Could potentially be up for playing red team against you, in exchange for you playing red team against me (but if I think I could have something to contribute as red team would depend on specifics of what is proposed/discussed - e.g., I'm not familiar with technical specifics of deep learning beyond vague descriptions).

Thanks for writing this! Do you have any thoughts on doing a red team/blue team alignment tournament as described here?

Many! Thanks for sharing. This could easily turn into its own post.

In general, I think this is a great idea. I'm somewhat skeptical that this format would generate deep insights; in my experience successful Capture the Flag / wargames / tabletop exercises work best in the form where each group spends a lot of time preparing for their particular role, but opsec wargames are usually easier to score, so the judge role makes less sense there. That said, in the alignment world I'm generally supportive of trying as many different approaches as possible to see wh... (read more)

Chapter 7 in this book had a few good thoughts on getting critical feedback from subordinates, specifically in the context of avoiding disasters. The book claims that merely encouraging subordinates to give critical feedback is often insufficient, and offers ideas for other things to do.

8Lone Pine2y
Can you give us 3-5 bullet points of summary?

And just as I was writing this I came across another good example of the ‘you think you’re in competition with others like you but mostly you’re simply trying to be good enough’

I'm straight, so possibly unreliable, but I remember Michael Curzi as a very good-looking guy with a deep sexy voice. I believe him when he says other dudes are not competition for him 95% of the time. ;-)

Fair, but my experience says this is true even for Area Man, although Area Man will have a harder time meeting the bar.

I wrote a comment here arguing that voting systems tend to encourage conformity. I think this is a way in which the LW voting system could be improved. You might get rid of the unlabeled quality axis and force downvoters to be specific about why they dislike the comment. Maybe readers could specify which weights they want to assign to the remaining axes in order to sort comments.

I think Agree/Disagree is better than True/False, and Understandable/Confusing would be better than Clear/Muddled. Both of these axes are functions of two things (the reader an... (read more)

I'll respond to the "Predict hypothetical sensors" section in this comment.

First, I want to mention that predicting hypothetical sensors seems likely to fail in fairly obvious ways, e.g. you request a prediction about a sensor that's physically nonexistent and the system responds with a bunch of static or something. Note the contrast with the "human simulator" failure mode, which is much less obvious.

But I also think we can train the system to predict hypothetical sensors in a way that's really useful. As in my previous comment, I'll work from the assump... (read more)

If we train on data about what hypothetical sensors should show (e.g. by experiments where we estimate what they would show using other means, or by actually building weird sensors), we could just end up getting predictions of whatever process we used to generate that data. In general the overall situation with these sensors seems quite similar to the original outer-level problem, i.e. training the system to answer "what would an ideal sensor show?" seems to run into the same issues as answering "what's actually going on?" E.g. your supersensor idea #3 seems to be similar to the "human operates SmartVault and knows if tampering occurred" proposal we discussed here. I do think that excising knowledge is a substantive change, I feel like it's effectively banking on "if the model is ignorant enough about what humans are capable of, it needs to err on the side of assuming they know everything." But for intelligent models, it seems hard in general to excise knowledge of whole kinds of sensors (how do you know a lot about human civilization without knowing that it's possible to build a microphone?) without interfering with performance. And there are enough signatures that the excised knowledge is still not in-distribution with hypotheticals we make up (e.g. the possibility of microphones is consistent with everything else I know about human civilization and physics, the possibility of invisible and untouchable cameras isn't) and conservative bounds on what humans can know will still hit the one but not the other.

Thanks for the reply! I'll respond to the "Hold out sensors" section in this comment.

One assumption which seems fairly safe in my mind is that as the operators, we have control over the data our AI gets. (Another way of thinking about it is if we don't have control over the data our AI gets, the game has already been lost.)

Given that assumption, this problem seems potentially solvable

Moreover, my AI may be able to deduce the presence of the additional sensors very cheaply. Perhaps it can notice the sensors, or it can learn about my past actions to get

... (read more)

I wrote a post in response to the report: Eliciting Latent Knowledge Via Hypothetical Sensors.

Some other thoughts:

  • I felt like the report was unusually well-motivated when I put my "mainstream ML" glasses on, relative to a lot of alignment work.

  • ARC's overall approach is probably my favorite out of alignment research groups I'm aware of. I still think running a builder/breaker tournament of the sort proposed at the end of this comment could be cool.

  • Not sure if this is relevant in practice, but... the report talks about Bayesian networks learned via

... (read more)
Thanks for the kind words (and proposal)! I broadly agree that "train a bunch of models and panic if any of them say something is wrong." The main catch is that this only works if none of the models are optimized to say something scary, or to say something different for the sake of being different. We discuss this a bit in this appendix. We're imagining the case where the predictor internally performs inference in a learned model, i.e. we're not explicitly learning a bayesian network but merely considering possibilities for what an opaque neural net is actually doing (or approximating) on the inside. I don't think this is a particularly realistic possibility, but if ELK fails in this kind of simple case it seems likely to fail in messier realistic cases. (We're actually planning to do  a narrower contest focused on ELK proposals.)

(Well, really I expect it to take <12 months, but planning fallacy and safety margins and time to iterate a little and all that.)

There's also red teaming time, and lag in idea uptake/marketing, to account for. It's possible that we'll have the solution to FAI when AGI gets invented, but the inventor won't be connected to our community and won't be aware of/sold on the solution.

Edit: Don't forget to account for the actual engineering effort to implement the safety solution and integrate it with capabilities work. Ideally there is time for extensive testing and/or formal verification.

I fear this too, at least because it's the most "yelling-at-the-people-onscreen-to-act-differently" scenario that still involves the "hard part" getting solved. I wish there was more discussion of this.

Yes, if you've just created it, then the criteria are meaningfully different in that case for a very limited time.

It's not obvious to me that this is only true right after creation for a very limited time. What is supposed to change after that?

I don't see how we're getting off track. (Your original statement was: 'One such "clever designer" idea is decoupling plan generation from plan execution, which really just means that the plan generator has humans as part of the initial plan executing hardware.' If we're discussing situations where that claim m... (read more)

What changes is that the human sees that the AI is producing plans that try to manipulate humans. It is very likely that the human does not want the AI to produce such plans, and so applies some corrective action against it happening in future.

My point is that plan execution can't be decoupled successfully from plan generation in this way. "Outputting a plan" is in itself an action that affects the world, and an unfriendly superintelligence restricted to only producing plans will still win.

"Outputting a plan" may technically constitute an action, but a superintelligent system (defining "superintelligent" as being able to search large spaces quickly) might not evaluate its effects as such.

Yes, it is possible for plans to score highly under the first criterion but not the second. However, in

... (read more)
Yes, if you've just created it, then the criteria are meaningfully different in that case for a very limited time. But we're getting a long way off track here, since the original question was about what the flaw is with separating plan generation from plan execution as a general principle for achieving AI safety. Are you clearer about my position on that now?

The main problem is that "acting via plans that are passed to humans" is not much different from "acting via plans that are passed to robots" when the AI is good enough at modelling humans.

I agree this is true. But I don't see why "acting via plans that are passed to humans" is what would happen.

I mean, that might be a component of the plan which is generated. But the assumption here is that we've decoupled plan generation from plan execution successfully, no?

So we therefore know that the plan we're looking at (at least at the top level) is the result... (read more)

My point is that plan execution can't be decoupled successfully from plan generation in this way. "Outputting a plan" is in itself an action that affects the world, and an unfriendly superintelligence restricted to only producing plans will still win. Also, I think the last sentence is literally true, but misleading. Yes, it is possible for plans to score highly under the first criterion but not the second. However, in this scenario the humans are presumably going to discourage such plans, so they effectively score the same as the second criterion.

I agree these are legitimate concerns... these are the kind of "deep" arguments I find more persuasive.

In that thread, johnswentworth wrote:

In particular, even if we have a reward signal which is "close" to incentivizing alignment in some sense, the actual-process-which-generates-the-reward-signal is likely to be at least as simple/natural as actual alignment.

I'd solve this by maintaining uncertainty about the "reward signal", so the AI tries to find a plan which looks good under both alignment and the actual-process-which-generates-the-reward-signal. ... (read more)

One such "clever designer" idea is decoupling plan generation from plan execution, which really just means that the plan generator has humans as part of the initial plan executing hardware. You don't need a deep argument to point out an obvious flaw there.

I don't see the "obvious flaw" you're pointing at and would appreciate a more in-depth explanation.

In my mind, decoupling plan generation from plan execution, if done well, accomplishes something like this:

  • You ask your AGI to generate a plan for how it could maximize paperclips.

  • Your AGI generates

... (read more)
The main problem is that "acting via plans that are passed to humans" is not much different from "acting via plans that are passed to robots" when the AI is good enough at modelling humans. I don't think this needs an in-depth explanation, does it? I don't think the given scenario is realistic for any sort of competent AI. There are two sub-cases: If step 1 won't fail due to being read, then the scenario is unrealistic at the "you stop reading the plan at that point" stage. This might be possible for a sufficiently intelligent AI, but that's already a game over case. If step 1 will fail due to the plan being read, a competent AI should be able to predict that step 1 will fail due to being read. The scenario is then unrealistic at the "your AGI generates a plan ..." stage, because it should be assumed that the AI won't produce plans that it predicts won't work. So this leaves only the assumption that the AI is terrible at modelling humans, but can still make plans that should work well in the real world where humans currently dominate. Maybe there is some tiny corner of possibility space where that can happen, but I don't think it contributes much to the overall likelihood unless we can find a way to eliminate everything else.

I had the same view as you, and was persuaded out of it in this thread. Maybe to shift focus a little, one interesting question here is about training. How do you train a plan-generating AI? If you reward plans that sound like they'd succeed, regardless of how icky they seem, then the AI will become useless to you by outputting effective-sounding but icky plans. But if you reward only plans that look nice enough to execute, that tempts the AI to make plans that manipulate whoever is reading them, and we're back at square one.

Maybe that's a good way to look... (read more)

For what it's worth, I often find Eliezer's arguments unpersuasive because they seem shallow. For example:

The insight is in realizing that the hypothetical planner is only one line of outer shell command away from being a Big Scary Thing and is therefore also liable to be Big and Scary in many ways.

This seem like a fuzzy "outside view" sort of argument. (Compare with: "A loaded gun is one trigger pull away from killing someone and is therefore liable to be deadly in many ways." On the other hand, a causal model of a gun lets you explain which specif... (read more)

2Eli Tyre2y
My comment on that post asks more-or-less the same question, and also ventures an answer.
I agree that it's a shallow argument presentation, but that's not the same thing as being based on shallow ideas. The context provided more depth, and in general a fair few of the shallowly presented arguments seem to be counters to even more shallow arguments. In general one of the deeper concepts underlying all these shallow arguments appears to be some sort of thesis of "AGI-completeness", in which any single system that can reach or exceed human mental capability on most tasks, will almost certainly reach or exceed on all mental tasks, including deceiving and manipulating humans. Combining that with potentially very much greater flexibility and extensibility of computing substrate means you get an incredibly dangerous situation no matter how clever the designers think they've been. One such "clever designer" idea is decoupling plan generation from plan execution, which really just means that the plan generator has humans as part of the initial plan executing hardware. You don't need a deep argument to point out an obvious flaw there. Talking about mesa-optimizers in a such a context is just missing the point from a view in which humans can potentially be used as part of a toolchain in much the same way as robot arms or protein factories.

As the proposal stands it seems like the AI's predictions of human thoughts would offer no relevant information about how the AI is predicting the non-thought story content, since the AI could be predicting these different pieces of content through unrelated mechanisms.

Might depend whether the "thought" part comes before or after particular story text. If the "thought" comes after that story text, then it's generated conditional on that text, essentially a rationalization of that text from a hypothetical DM's point of view. If it comes before that sto... (read more)

I updated the post to note that if you want voting rights in Google, it seems you should buy $GOOGL not $GOOG. Sorry! Luckily they are about the same price, and you can easily dump your $GOOG for $GOOGL. In fact, it looks like $GOOGL is $6 cheaper than $GOOG right now? Perhaps because it is less liquid?

Fraud also seems like the kind of problem you can address as it comes up. And I suspect just requiring people to take a salary cut is a fairly effective way to filter for idealism.

All you have to do to distract fraudsters is put a list of poorly run software companies where you can get paid more money to work less hard at the top of the application ;-) How many fraudsters would be silly enough to bother with a fraud opportunity that wasn't on the Pareto frontier?

The problem comes when one tries to pour a lot of money into that sort of approach

It seems to me that the Goodhart effect is actually stronger if you're granting less money.

Suppose that we have a population of people who are keen to work on AI safety. Suppose every time a person from that population gets an application for funding rejected, they lose a bit of the idealism which initially drew them to the area and they start having a few more cynical thoughts like "my guess is that grantmakers want to fund X, maybe I should try to be more like X even th... (read more)

I think if you're in the early stages of a big project, like founding a pre-paradigmatic field, it often makes sense to be very breadth-first. You can save a lot of time trying to understand the broad contours of solution space before you get too deeply invested in a particular approach.

I think this can even be seen at the microscale (e.g. I was coaching someone on how to solve leetcode problems the other day, and he said my most valuable tip was to brainstorm several different approaches before exploring any one approach in depth). But it really shines ... (read more)

Yes, I tried it. It gave me a headache but I would guess that's not common. Think it's probably a decent place to start.

I didn't end up sticking to this because of various life disruptions. I think it was a bit helpful but I'm planning to try something more intensive next time.

Did you end up trying the microneedling? I'm curious about that route.

I'm glad you are thinking about this. I am very optimistic about AI alignment research along these lines. However, I'm inclined to think that the strong form of the natural abstraction hypothesis is pretty much false. Different languages and different cultures, and even different academic fields within a single culture (or different researchers within a single academic field), come up with different abstractions. See for example lsusr's posts on the color blue or the flexibility of abstract concepts. (The Whorf hypothesis might also be worth looking i... (read more)

Interesting, thanks for sharing.

I couldn't figure out how to go backwards easily.

Command-shift-g right?

I ended up using cmd+shift+i which opens the find/replace panel with the default set to backwards.

After practicing Vim for a few months, I timed myself doing the Vim tutorial (vimtutor on the command line) using both Vim with the commands recommended in the tutorial, and a click-and-type editor. The click-and-type editor was significantly faster. Nowadays I just use Vim for the macros, if I want to do a particular operation repeatedly on a file.

I think if you get in the habit of double-clicking to select words and triple-clicking to select lines (triple-click and drag to select blocks of code), click-and-type editors can be pretty fast.

This is a great experiment, I'll try it out too. I also have pretty decent habits for non-vim editing so it'll be interesting to see.

We present a useful toy environment for reasoning about deceptive alignment. In this environment, there is a button. Agents have two actions: to press the button or to refrain. If the agent presses the button, they get +1 reward for this episode and -10 reward next episode. One might note a similarity with the traditional marshmallow test of delayed gratification.

Are you sure that "episode" is the word you're looking for here?“episode”-mean-in-the-context-of-reinforcement-learning-RL

I'm especially confused becaus... (read more)

Yes; episode is correct there—the whole point of that example is that, by breaking the episodic independence assumption, otherwise hidden non-myopia can be revealed. See the discussion of the prisoner's dilemma unit test in Krueger et al.'s “Hidden Incentives for Auto-Induced Distributional Shift” for more detail on how breaking this sort of episodic independence plays out in practice.

lsuser had an interesting idea of creating a new Youtube account and explicitly training the recommendation system to recommend particular videos (in his case, music):

I guess you could also do it for Youtube channels which are informative & entertaining, e.g. CGP Grey and Veritasium. I believe studies have found that laughter tends to be rejuvenating, so optimizing for videos you think are funny is another idea.

I suspect you will be most successful at this if you get in the habit of taking breaks away from your computer when you inevitably start to flag mentally. Some that have worked for me include: going for a walk, talking to friends, taking a nap, reading a magazine, juggling, noodling on a guitar, or just daydreaming.

Thanks for sharing your experiences and recommendations :) Going for a walk usually helps me out, and today was no exception (I walked almost 20,000 steps today split between two main walking sessions and misc daily tasks). I talked with friends while walking most of the time, that was a nice bonus. Right now I don't have access to my desktop (it is packed for moving) so have been working primarily off of my laptop: being able to simply close the lid and walk away when flagging or otherwise needing a break helps a lot and feels much more satisfying in the moment than clicking a few buttons to put my desktop to sleep.

...When we can state code that would solve the problem given a hypercomputer, we have become less confused. Once we have the unbounded solution we understand, in some basic sense, the kind of work we are trying to perform, and then we can try to figure out how to do it efficiently.

ASHLEY: Which may well require new insights into the structure of the problem, or even a conceptual revolution in how we imagine the work we're trying to do.

I'm not convinced your chess example, where the practical solution resembles the hypercomputer one, is representativ... (read more)

3Optimization Process3y
The understanding I came away with: there are (at least) three stages of understanding a problem: 1. You can't write a program to solve it. 2. You can write a cartoonishly wasteful program to solve it. 3. You can write a computationally feasible program to solve it. "Shuffle-sort" achieves the second level of knowledge re: sorting lists. Yeah, it's cartoonishly wasteful, and it doesn't even resemble any computationally feasible sorting algorithm (that I'm aware of) -- but, y'know, viewed through this lens, it's still a huge step up from not even understanding "sorting" well enough to sort a list at all. (Hmm, only marginally related but entertaining: if you reframe the problem of epistemology not as sequence prediction, but as "deduce what program is running your environment," then a Solomonoff inductor can be pretty fairly described as "consider every possible object of type EnvironmentProgram; update its probability based on the sensory input; return the posterior PDF over EnvironmentProgram-space." The equivalent program for list-sorting is "consider every possible object of type List<Int>; check if (a) it's sorted, and (b) it matches the element-counts of the input-list; if so, return it." Which is even more cartoonishly wasteful than shuffle-sort. Ooh, and if you want to generalize to cases where the list-elements are real numbers, I think you get/have to include something that looks a lot like Solomonoff induction, forcing countability on the the reals by iterating over all possible programs that evaluate to real numbers (and hoping to God that whatever process generated the input list, your mathematical-expression-language is powerful enough to describe all the elements).)

From a safety standpoint, hoping and praying that SGD won't stumble across lookahead doesn't seem very robust, if lookahead represents a way to improve performance. I imagine that whether SGD stumbles across lookahead will end up depending on complicated details of the loss surface that's being traversed.

1Jack R3y
I agree, and thanks for the reply. And I agree that even a small chance of catastrophe is not robust. Though I asked because I still care about the probability of things going badly, even if I think that probability is worryingly high. Though I see now (thanks to you!) that in this case our prior that SGD will find look-ahead is still relatively high and that belief won't change much by thinking about it more due to sensitivity to complicated details we can't easily know.

Lately I've been examining the activities I do to relax and how they might be improved. If you haven't given much thought to this topic, Meaningful Rest is excellent background reading.

An interesting source of info for me has been lsusr's posts on cutting out junk media: 1, 2, 3. Although I find lsusr's posts inspiring, I'm not sure I want to pursue the same approach myself. lsusr says: "The harder a medium is to consume (or create, as applicable) the smarter it makes me." They responded to this by cutting all the easy-to-consume media out of their lif... (read more)

Good to know! I was thinking the application process would be very transparent and non-demanding, but maybe it's better to ditch it altogether.

Related to the discussion of weighted voting allegedly facilitating groupthink earlier

An interesting litmus test for groupthink might be: What has LW changed its collective mind about? By that I mean: the topic was discussed on LW, there was a particular position on the issue that was held by the majority of users, new evidence/arguments came in, and now there's a different position which is held by the majority of users. I'm a bit concerned that nothing comes to mind which mee... (read more)

  • Replication Crisis definitely hit hard. Lots of stuff there. 
  • People's timelines have changed quite a bit. People used to plan for 50-60 years, now it's much more like 20-30 years. 
  • Bayesianism is much less the basis for stuff. I think this one is still propagating, but I think Embedded Agency had a big effect here, at least on me and a bunch of other people I know.
  • There were a lot of shifts on the spectrum "just do explicit reasoning for everything" to "figuring out how to interface with your System 1 sure seems really important". I think Eliezer
... (read more)

I feel like there was a mass community movement (not unanimous but substantial) from AGI-scenarios-that-Eliezer-has-in-mind to AGI-scenarios-that-Paul-has-in-mind, e.g. more belief in slow takeoff + multipolar + "What Failure Looks Like" and less belief in fast takeoff + decisive strategic advantage + recursive self-improvement + powerful agents coherently pursuing misaligned goals. This was mostly before my time, I could be misreading things, that's just my impression. :-)

Sequences: Beisutsukai One year later: Extreme Rationality: It's Not That Great
6Yoav Ravid3y
Priming? Though that does feel like a fairly week example.

For whatever it's worth, I believe I was the first to propose weighted voting on LW, and I've come to agree with Czynski that this is a big downside. Not necessarily enough to outweigh the upsides, and probably insufficient to account for all the things Czynski dislikes about LW, but I'm embarrassed that I didn't foresee it as a potential problem. If I was starting a new forum today, I think I'd experiment with no voting at all -- maybe try achieving quality control by having an application process for new users? Does anyone have thoughts about that?

The 'application process' used by Overcoming Bias back in the day, namely 'you have to send an email with your post and name', would probably be entirely sufficient. It screens out almost everyone, after all. But in actuality, what I'd most favor would be everyone maintaining their own blog and the central repository being nothing but a blogroll. Maybe allow voting on the blogroll's ordering.
Personally, I am allergic to application processes. Especially opaque ones. I likely would have never joined this website if there was an application process for new users. I don't think the site is too crowded with bad content right now, though that's certainly a potential problem if more people choose to write posts. If lots more people flood this site with low quality posts then an alternative solution could be to just tighten the frontpage criteria. For context: I was not part of Less Wrong 1.0. I have only known Less Wrong 2.0.
IMO the thing voting is mostly useful for is sorting content, not users. You might imagine me writing twenty different things, and then only some of them making it in front of the eyes of most users, and this is done primarily through people upvoting and downvoting to say "I want to see more/less content like this", and then more/less people being shown that content. Yes, this has first-mover problems and various other things, but so do things like 'recent discussion' (where the number of comments that are spawned by something determines its 'effective karma'). Now, in situations where all the users see all the things, I don't think you need this sort of thing--but I'm assuming LW-ish things are hoping to be larger than that scale.

Another possible AI parallel: Some people undergo a positive feedback loop where more despair leads to less creativity, less creativity leads to less problem-solving ability (e.g. P100 thing), less problem-solving ability leads to a belief that the problem is impossible, and a belief that the problem is impossible leads to more despair.

China's government is more involved to large-scale businesses.

According to the World Economic Forum website:

China is home to 109 corporations listed on the Fortune Global 500 - but only 15% of those are privately owned.

Like, maybe depending on the viewer history, the best video to polarize the person is different, and the algorithm could learn that. If you follow that line of reasoning, the system starts to make better and better models of human behavior and how to influence them, without having to "jump out of the system" as you say.

Makes sense.

...there's a lot of content on YouTube about YouTube, so it could become "self-aware" in the sense of understanding the system in which it is embedded.

I think it might be useful to distinguish between being aware of onesel... (read more)

I suspect the best way to think about the polarizing political content thing which is going on right now is something like: The algorithm knows that if it recommends some polarizing political stuff, there's some chance you will head down a rabbit hole and watch a bunch more vids. So in terms of maximizing your expected watch time, recommending polarizing political stuff is a good bet. "Jumping out of the system" and noticing that recommending polarizing videos also polarizes society as a whole and gets them to spend more time on Youtube on a macro level ... (read more)

I agree, but I think you can have problems (and even Predict-O-Matic like problems) without reaching that different sort of reasoning. Like, maybe depending on the viewer history, the best video to polarize the person is different, and the algorithm could learn that. If you follow that line of reasoning, the system starts to make better and better models of human behavior and how to influence them, without having to "jump out of the system" as you say. One could also argue that because YouTube videos contain so much info about the real world, a powerful enough algorithm using them can probably develop a pretty good model of the world. And there's a lot of content on YouTube about YouTube, so it could become "self-aware" in the sense of understanding the system in which it is embedded. Agreed, this is more the kind of problem that emerges from RL like training. The page on the Tournesol wiki about this subject points to this recent paper that propose a recommendation algorithm tried in practice on YouTube. AFAIK we don't have access to the actual algorithm used by YouTube, so it's hard to say whether it's using RL; but the paper above looks like evidence that it eventually will be.

Not sure if this answers, but the book Superforecasting explains, among other things, that probabilistic thinkers tend to make better forecasts.

Yes, I didn't say "they are not considering that hypothesis", I am saying "they don't want to consider that hypothesis". Those do indeed imply very different actions. I think one gives very naturally rise to producing counterarguments, the other one does not.

They don't want to consider the hypothesis, and that's why they'll spend a bunch of time carefully considering it and trying to figure out why it is flawed?

In any case... Assuming the Twitter discussion is accurate, some people working on AGI have already thought about the "alignment is hard" positi... (read more)

What? What about all the people who prefer to do fun research that builds capabilities and has direct ways to make them rich, without having to consider the hypothesis that maybe they are causing harm?

If they're not considering that hypothesis, that means they're not trying to think of arguments against it. Do we disagree?

I agree if the government was seriously considering regulation of AI, the AI industry would probably lobby against this. But that's not the same question. From a PR perspective, just ignoring critics often seems to be a good strategy.

Yes, I didn't say "they are not considering that hypothesis", I am saying "they don't want to consider that hypothesis". Those do indeed imply very different actions. I think one gives very naturally rise to producing counterarguments, the other one does not.  I am not really sure what you mean by the second paragraph. AI is being actively regulated, and there are very active lobbying efforts on behalf of the big technology companies, producing large volumes of arguments for why AI is nothing you have to worry about. 
Load More