Recent Discussion

The technique

This post is signal-boosting and recommending a strategy for improving your decision making that I picked up from the entrepreneur Ivan Mazour. He describes the process here, and publishes his own results every year on his blog.

In his words...

I believe that life is far too fast-paced for us to be able to make rational, carefully thought-out, decisions all the time. This completely contradicts my mathematical upbringing and training, but is something I have come to realise throughout my twenties. We need a way of keeping up with the constant barrage of decisions, even when the i
... (Read more)
As Few As Possible
113d2 min readShow Highlight

All of economics, every last bit of it, is about scarcity. About what is scarce and what is not, and about who does and who doesn’t get their needs (and sometimes wants) satisfied.

Much of the debate about healthcare is in fact a scarcity problem. There aren’t enough practitioners in practice to handle every patient, so some people don’t get doctors. It’s a combination of self-selection where people who can’t afford to take time off to have an ingrown toenail treated professionally hack at it with a pocketknife instead, and insurance selection where granny... (Read more)

You said "More healtchare isn't always better".

Can you give a central example about a situation where more people receiving healthcare is worse, and why we should characterize that situation as one where more people receive healthcare?

If the government restricts the supply of meat (and food generally is adequately distributed), then the finite supply of meat makes it positional, and the fact that all needs are being met (within the scope of the example, at least) makes the outcome satisfactory.

If you thought that I intended some element of "maximize production even if all needs have already been met", then we have completely failed to communicate.

Here are six cases where I was pretty confident in my understanding of the microeconomics of something, but then later found out I was missing an important consideration.

Thanks to Richard Ngo and Tristan Hume for helpful comments.

Here’s the list of mistakes:

  • I thought divesting from a company had no effect on the company.
  • I thought that the prices on a prediction market converged to the probabilities of the underlying event.
  • I thought that I shouldn’t expect to be able to make better investment decisions than buying index funds.
  • I had a bad understanding of externalities, which was i
... (Read more)
1Czynski42mIn an immediate but not useful sense, from your bank, because while the calculations that make them determine this mortgage is a good investment involve other actors, none of the interaction with other actors is instantaneous. On the time scale of a month (and probably a week), partially from the company who bought your mortgage for its risk-adjusted net present value (let's just call that V), and the rest from 'nowhere', i.e. the amount they're now allowed to lend by the fractional reserve banking regulations based on the fact that their reserves have just increased by V. Trying to work out which of those portions is bigger is making my head hurt, mostly because I'm pretty sure the relative value of the risk-adjusted net present value of the mortgage's future cash flow, compared with the reserve fraction and the actual face value of the mortgage, matter, but I'm not quite sure how.
2romeostevensit40mYes, though the reserve requirement was recently dropped to zero.
1Czynski34mWat. Is there some more complicated picture, there, where they don't have to maintain real reserves but they do need to hold something like overnight loans from the Fed as reserves? Because if they're really untethered from reserves of any kind, they can 'print' as much money as they want, which seems... It's not necessarily a terrible idea, but it does entail that the central bank give up its monopoly, and that seems like an insane move from the central bank's perspective.

I don't know the full picture on that, it confuses me too.

The Basic Double Crux pattern
441d3 min readShow Highlight

I’ve spent a lot of time developing tools and frameworks for bridging “intractable” disagreements. I’m also the person affiliated with CFAR who has taught Double Crux the most, and done the most work with it.

People often express to me something to the effect,

The important thing about Double Crux is all the low level habits of mind: being curious, being open to changing your mind, paraphrasing to check that you’ve understood, operationalizing, etc. The ‘Double Crux’ framework, itself is not very important.

I half agree with that sentiment. I do think that the low level cognitive and conversation... (Read more)

This comment helped me articulate something that I hadn't quite put my finger on before.

There are actually two things that I want to stand up for, which are, from naive perspective, in tension. So I think I need to make sure not to lump them together.

One the one hand, yeah, I think it is deeply true that you can unilaterally do the thing, and with sufficient skill, you can make "the Double Crux thing" work, even with a person who doesn't explicitly opt in for that kind of discourse (because curiosity and empathy are contagious, and many... (read more)

4elityre2hFor context, Mark participated in a session I ran via Zoom last weekend, that covered this pattern. For what its worth, that particular conversation is the main thing that caused me to add a paragraph about distillation (even just as a bookmark), to the OP. I'm not super confident what would have most helped there, though.
4elityre2hGar. I thought I finally got images to work on a post of mine.
4Raemon2hI live in the bubble that literally invented Doublecrux, so, it indeed is not surprising if my experience doesn’t generalize.

In this competition, we (Ought) want to amplify Rohin Shah’s forecast for the question: When will a majority of AGI researchers agree with safety concerns? Rohin has provided a prior distribution based on what he currently believes, and we want others to:

  1. Try to update Rohin’s thinking via comments (for example, comments including reasoning, distributions, and information sources). If you don’t want your comment to be considered for the competition, label it ‘aside’
  2. Predict what his posterior distribution for the question will be after he has read all the comme
... (Read more)

I think the following is underspecified:

Does X agree that there is at least one concern such that we have not yet solved it and we should not build superintelligent AGI until we do solve it?

What counts as building superintelligent AGI?

This could mean anything from working on foundational theory which could be used to facilitate building an AGI, to finishing the final phase of training on a fully functional AGI implementation.

In the former case you're going to get close to 0% agreement. In the latter, well over 50% already (I hope!).

I don't see a... (read more)

2elifland3hI think it's >1% likely that the one of the first few surveys Rohin conducted would result in a fraction of >0.5. Evidence from When Will AI Exceed Human Performance? [https://arxiv.org/pdf/1705.08807.pdf#page=13], in the form of median survey responses of researchers who published at ICML and NIPS in 2015: * 5% chance given to Human Level Machine Intelligence (HLMI) having an extremely bad long run impact (e.g. human extinction) * Does Stuart Russell's argument for why highly advanced AI might pose a risk point at an important problem? 39% say at least important, 70% at least moderately important. * But on the other hand, only 8.4% said working on this problem is now is more valuable than other problems in the field. 28% said as valuable as other problems. * 47% agreed that society should prioritize "AI Safety Research" more than it currently was. These seem like fairly safe lower bounds compared to the population of researchers Rohin would evaluate, since concern regarding safety has increased since 2015 and the survey included all AI researchers rather than only those whose work is related to AGI. These responses are more directly related to the answer to Question 3 ("Does X agree that there is at least one concern such that we have not yet solved it and we should not build superintelligent AGI until we do solve it?") than Question 2 ("Does X broadly understand the main concerns of the safety community?"). I feel very uncertain about the percentage that would pass Question 2, but think it is more likely to be the "bottleneck" than Question 3. Given these considerations, I increased the probability before 2023 to 10%. I moved the median | not never up to 2035 as a higher probability pretty soon also means a sooner median. I decreased the probability of “never” to 20%, since the “not enough people update on it / consensus building takes forever / the population I chose just doesn't pay attention to safety for some reason” condition seems
1Beth Barnes3hYeah I also thought this might just be true already, for similar reasons
2Benjamin Rachbach6hMy distribution [https://elicit.ought.org/builder/tBZpx5MVT] My biggest differences with Rohin's prior distribution are: 1. I think that it's much more likely than he does that AGI researchers already agree with safety concerns 2. I think it's considerably more likely than he does that the majority of AGI researchers will never agree with safety concerns These differences are explained more on my distribution and in my other comments. The next step that I think would help the most to make my distribution better would be to do more research [https://www.lesswrong.com/posts/Azqmzp5JoXJihMcr4/amplify-rohin-s-prediction-on-agi-researchers-and-safety?commentId=BjwHryhtEptfmoB5z] .

Twitter thread by Eliezer Yudkowsky, with the bounty in bold: 

So I don't want to sound alarms prematurely, here, but we could possibly be looking at the first case of an AI pretending to be stupider than it is. In this example, GPT-3 apparently fails to learn/understand how to detect balanced sets of parentheses. 

Now, it's possibly that GPT-3 "legitimately" did not understand this concept, even though GPT-3 can, in other contexts, seemingly write code or multiply 5-digit numbers. But it's also possible that GPT-3, playing the role of John, predicted that *John* wouldn't learn it.

It's

... (Read more)

I don't think this is that crazy of a request. Many of the other fields of machine learning have robust visualizations that hazard at what the AI is "thinking." I haven't seen an equivalent thing for Transformer-based NLPs, but why not?

6ESRogs8hThis does seem like an interesting question. But I think we should be careful to measure against the task we actually asked the system to perform. For example, if I ask my system to produce a cartoon drawing, it doesn't seem very notable if I get a cartoon as a result rather than a photorealistic image, even if it could have produced the latter. Maybe what this just means is that we should track what the user understands the task to be. If the user thinks of it as "play a (not very smart) character who's asked to do this task", they'll have a pretty different understanding of what's going on than if they think of it as "do this task." I think what's notable in the example in the post is not that the AI is being especially deceptive, but that the user is especially likely to misunderstand the task (compared to tasks that don't involve dialogues with characters).
2Vaniver5hConsider instead the scenario where I show you a photo of a face, and you produce a photo of the side of the face. An interesting question is "is there a 3d representation of the face in the model?". It could be getting the right answer that way, or it could be getting it some other way. Similarly, when it models a 'dumb' character, is it calculating the right answer, and then computing an error? Or is it just doing something dumb, which incidentally turns out to be wrong? Like, when you look at this example [https://twitter.com/ESYudkowsky/status/1285663436299546624]: How did it come up with 19 and 20? What would it take to make tools that could answer that question?
13Kaj_Sotala9h"It's tempting to anthropomorphize GPT-3 as trying its hardest to make John smart" seems obviously incorrect if it's explicitly phrased that way, but e.g. the "Giving GPT-3 a Turing Test [http://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.html]" post seems to implicitly assume something like it: The author says that this "stumps" GPT-3, which "doesn't know how to" say that it doesn't know. That's as if GPT-3 was doing its best to give "smart" answers, and just was incapable of doing so. But Nick Cammarata showed [https://twitter.com/nicklovescode/status/1284050958977130497] that if you just give GPT-3 a prompt where nonsense answers are called out as such, it will do just that.

LessWrong has been around for 10+ years, CFAR's been at work for around 6, and I think there have been at least a few other groups or individuals working on what I think of as the "Human Rationality Project."

I'm interested, especially from people who have invested significant time in attempting to push the rationality project forward, what they consider the major open questions facing the field. (More details in this comment)

"What is the Rationality Project?"

I'd prefer to leave "Rationality Project" somewhat vague, but I'd roughly summarize it ... (Read more)

Even assuming you're correct, postrationalism won't help with any of that because it's nothing but systematized self-delusion. Rationality may not have benefits as huge as one'd naively expect, but it is still substantially better than deliberately turning your back on even attempting to be rational, which is what postrationalism does - intentionally!

So there’s this thing where GPT-3 is able to do addition, it has the internal model to do addition, but it takes a little poking and prodding to actually get it to do addition. “Few-shot learning”, as the paper calls it. Rather than prompting the model with

Q: What is 48 + 76? A:

… instead prompt it with

Q: What is 48 + 76? A: 124

Q: What is 34 + 53? A: 87

Q: What is 29 + 86? A:

The same applies to lots of other tasks: arithmetic, anagrams and spelling correction, translation, assorted benchmarks, etc. To get GPT-3 to do the thing we want, it helps to give it a few examples, so it can “figure out wh... (Read more)

6dxu3hGPT-3 and systems like it are trained to mimic human discourse. Even if (in the limit of arbitrary computational power) it manages to encode an implicit representation of human values somewhere in its internal state, in actual practice there is nothing tying that representation to the phrase "human values", since moral philosophy is written by (confused) humans, and in human-written text the phrase "human values" is not used in the consistent, coherent manner that would be required to infer its use as a label for a fixed concept.
2John_Maxwell2hThis is essentially the "tasty ice cream flavors" problem, am I right? Trying to check if we're on the same page. If so: John Wentsworth said So how about instead of talking about "human values", we talk about what a particular moral philosopher endorses saying or doing, or even better, what a committee of famous moral philosophers would endorse saying/doing.
2dxu3hOn "conceding the point": The thesis that values are fragile doesn't have anything to do with how easy it is to create a system that models them implicitly, but with how easy it is to get an arbitrarily intelligent agent to behave in a way that preserves those values. The difference between those two things is analogous to the difference between a prediction task and a reinforcement learning task, and your argument (as far as I can tell) addresses the former, not the latter. Insofar as my reading of your argument is correct, there is no point to concede. On gwern's article: I'm not sure how to respond to this, except to state that neither this specific claim nor anything particularly close to it appears in the article I linked. On Tool AI: As far as I'm aware, this point has never been the subject of much dispute. This is still arguable; I have my doubts, but in a "big picture" sense this is largely irrelevant to the greater point, which is: This is (and remains) the crux. I still don't see how GPT-3 supports this claim! Just as a check that we're on the same page: when you say "value-loading problem", are you referring to something more specific than the general issue of getting an AI to learn and behave according to our values? *** META: I can understand that you're frustrated about this topic, especially if it seems to you that the "MIRI-sphere" (as you called it in a different comment) is persistently refusing to acknowledge something that appears obvious to you. Obviously, I don't agree with that characterization, but in general I don't want to engage in a discussion that one side is finding increasingly unpleasant, especially since that often causes the discussion to rapidly deteriorate in quality after a few replies. As such, I want to explicitly and openly relieve you of any social obligation you may have felt to reply to this comment. If you feel that your time would be better spent elsewhere, please do!

The thesis that values are fragile doesn't have anything to do with how easy it is to create a system that models them implicitly, but with how easy it is to get an arbitrarily intelligent agent to behave in a way that preserves those values. The difference between those two things is analogous to the difference between a prediction task and a reinforcement learning task, and your argument (as far as I can tell) addresses the former, not the latter. Insofar as my reading of your argument is correct, there is no point to concede.

If you can solve the prediction task, you can probably use the solution to create a reward function for your reinforcement learner.

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the sixth section in the reading guideIntelligence explosion kinetics. This corresponds to Chapter 4 in the book, of a similar name. This section is about how fast a human-level artificial intelligence might become superintelligent.

This post summarizes the section, and offers a few relevant notes, a... (Read more)

FWIW I think this 'milestone' is much less clear than Bostrom makes it sound. I'd imagine there's a lot of variation in fidelity of simulation, both measured in terms of brain signals and in terms of behaviour, and I'd be surprised if there were some discrete point at which everybody realised that they'd got it right.

The EMH Aten't Dead
1402mo18 min readShow Highlight

Cross-posting from my personal blog, but written primarily for Less Wrong after recent discussion here.


There are whispers that the Efficient-Market Hypothesis is dead. Eliezer's faith has been shaken. Scott says EMH may have been the real victim of the coronavirus.

The EMH states that “asset prices reflect all available information”. The direct implication is that if you don’t have any non-available information, you shouldn’t expect to be able to beat the market, except by chance.

But some people were able to preempt the corona crash, without ... (Read more)

I'm probably missing something - but when you say the vast majority of active investors do really badly, shouldn't that be impossible too? If markets are truly efficient, isn't it just as hard to underperform them as outperform?

Eli's shortform feed
1y1 min readShow Highlight

I'm mostly going to use this to crosspost links to my blog for less polished thoughts, Musings and Rough Drafts.

A hierarchy of behavioral change methods

Follow up to, and a continuation of the line of thinking from: Some classes of models of psychology and psychological change

Related to: The universe of possible interventions on human behavior (from 2017)

This post outlines a hierarchy of behavioral change methods. Each of these approaches is intended to be simpler, more light-weight, and faster to use (is that right?), than the one that comes after it. On the flip side, each of these approaches is intended to resolve a common major blocker of the approach before... (read more)

GPT-3 Gems
166h34 min readShow Highlight

GPT-3 is captivating, and not just because of its potential reasoning abilities. This post will be a living collection of my favorite experiences with the network. 

Bold text is my input; square brackets contain my commentary.

Long-form Writing

Beisutsukai Class Project: Solve Intent Alignment

"I've been thinking about this for far longer than you have. Building an AI is like building a person; it's not just a matter of throwing some parts together and suddenly you have a sentient being that does exactly what you want. AI design is a delicate art that requires hundreds of precise calibrations

... (Read more)

We take the web for granted, but maybe we shouldn't. It's very large and nobody can read it all. There are many places we haven't been that probably have some pretty good writing. I wonder about the extent to which GPT-3 can be considered a remix of the web that makes it seem magical again, revealing aspects of it that we don't normally see? When I see writing like this, I wonder what GPT-3 saw in the web corpus. Is there an archive of Tolkien fanfic that was included in the corpus? An undergrad physics forum? Conversations about math and computer science?

4Chris_Leong4hWhat's the bold text? How do we tell what you typed and what the API generated?
2TurnTrout4hThe bold is my input.

The July Cambridge Less Wrong / Slate Star Codex (RIP) meetup will be held online due to the plague.

Hangouts link

I've been reading a fair bit about "worse than death" scenarios from AGI (e.g. posts like this), and the intensities and probabilities of them. I've generally been under the impression that the worst-case scenarios have extremely low probabilities (i.e. would require some form of negative miracle to occur) and can be considered a form of Pascal's mugging.

Recently, however, I came across this post on OpenAI's blog. The blog post notes the following:

Bugs can optimize for bad behavior
One of our code refactors introduced a bug which flipped the sign of the reward. Fl
... (Read more)

Sorry for the dumb question a month after the post, but I've just found out about deceptive alignment. Do you think it's plausible that a signflipped AGI could fake being an FAI in the training stage, just to take a treacherous turn at deployment?

How good is humanity at coordination?
451d2 min readShow Highlight

When EAs look at the history of nuclear weapons, their reactions tend to fall into two camps.

The first camp (which I am inclined towards) is “Man, what a total mess. There were so many near misses, and people involved did such clearly terrible and risky things like setting up the dead hand system and whatever else. I guess that humans probably can’t be trusted to handle extremely dangerous technology.”

The other camp says “No nuclear weapons have been used or detonated accidentally since 1945. This is the optimal outcome, so I guess this is evidence that humanity is ... (Read more)

2Lanrian9hThere's probably some misunderstanding, but I'm not immediately spotting it when rereading. You wrote: Going by the parent comment, I'm interpreting this as * it = "we didn't observe nukes going off" * X = "humans are competent at handling dangerous technology" I think that * SIA thinks that "we didn't observe nukes going off" is relatively stronger evidence for "humans are competent at handling dangerous technology" (because SIA ignores observer selection effects, and updates naively). * SSA thinks that "we didn't observe nukes going off" is relatively weaker evidence for "humans are competent at handling dangerous technology" (because SSA doesn't update against hypothesis which would kill everyone). Which seems to contradict what you wrote?
2Pongo9hYep, sorry, looks like we do disagree. Not sure I'm parsing your earlier comment correctly, but I think you say "SIA says there should be more people everywhere, because then I'm more likely to exist. More people everywhere means I think my existence is evidence for people handling nukes correctly everywhere". I'm less sure what you say about SSA, either "SSA still considers the possibility that nukes are regularly mishandled in a way that kills everyone" or "SSA says you should also consider yourself selected from the worlds with no observers". Do I have you right? I say, "SIA says that if your prior is '10% everyone survives, 20% only 5% survive, 70% everyone dies', and you notice you're in a 'survived' world, you should think you are in the 'everyone survives' world with 90% probability (as that's where 90% of the probability-weighted survivors are)".
3rohinmshah12hOkay, fair enough, though I want to note that the entire disagreement in the post is about the backward-looking sense (if I'm understanding you correctly). Like, the question is how to interpret the fact that there were a lot of near misses, but no catastrophes (and for the class of nukes / bio etc. a 10% catastrophe seems way more likely than a 90-100% catastrophe).
Okay, fair enough, though I want to note that the entire disagreement in the post is about the backward-looking sense (if I'm understanding you correctly).

Oh interesting! I suspect you understand me correctly and we disagree. To elaborate:

If it means something for humans to be "good at coordination", it's that there's some underlying features that cause humans to succeed rather than fail at coordination challenges. If I said someone was "good at winning poker games", I don't just mean that they happened to win once,... (read more)

In light of reading Hazard's Shortform Feed -- which I really enjoy -- based on Raemon's Shortform feed, I'm making my own. There be thoughts here. Hopefully, this will also get me posting more.

3Hazard5hOne way I think about things. Everything that I've found in myself and close friends that looks and smells like "shoulds" is sorta sneaky. I keep on finding shoulds which seem have been absorbed from others and are less about "this is a good way to get a thing in the world that I want" and "someone said you need to follow this path and I need them to approve of me". The force I feel behind my shoulds is normally "You SCREWED if you don't!" a sort of vaguely panicy, inflexible energy. It's rarely connected to the actual good qualities of the thing I "should" be doing. Because my shoulds normally ground out in "if I'm not this way, people won't like me", if the pressure get's turned up, following a should takes me farther and farther away from things I actually care about. Unblocking stuff often feels like transcending the panicy fear that hides behind a should. It never immediately lets me be awesome at stuff. I still need to develop a real connection to the task and how it works into the rest of my life. There's still drudgery, but it's dealt with from a calmer place.

Yes I can relate to this!

3mr-hire7hI think removing internal conflicts is a "powerful but not sufficient." The people who are most productive are also great at amplifying external conflicts. That is, they have a clear, strong vision, and amplify the creative tension between what they have and know they can have. This can help you do things that are not "fun" like deliberate practice. but are totally aligned, in that you have no objections to doing them, and have a stance of acceptance towards the things that are not enjoyable. The best then augment that with powerful external structures that are supportive of their ideal internal states and external behaviors. Each one of these taken far enough can be powerful, and when combined together they are more than the sum of their parts.
1NaiveTortoise5hThanks, this framing is helpful for me for understanding how these things can be seen to fit together.
Open & Welcome Thread - July 2020
1520d1 min readShow Highlight

If it’s worth saying, but not worth its own post, here's a place to put it. (You can also make a shortform post)

And, if you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are welcome.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ.

The Open Thread sequence is here.

4steve21528hYeah. There's no gradient descent within a single episode, but if you have a network with input (as always) and with memory (e.g. an RNN) then its behavior in any given episode can be a complicated function of its input over time in that episode, which you can describe as "it figured something out from the input and that's now determining its further behavior". Anyway, everything you said is right, I think.
10Zack_M_Davis9hInterested. Be sure to check out Gwern's page on embryo selection [https://www.gwern.net/Embryo-selection] if you haven't already.
7ChaseG6hThis post is an absolute gold mine. Thank you!

I am getting worried that people are having so much fun doing interesting stuff with GPT-3 and AI Dungeon that they're forgetting how easy it is to fool yourself. Maybe we should think about how many different cognitive biases are in play here? Here are some features that make it particularly easy during casual exploration.

First, it works much like autocomplete, which makes it the most natural thing in the world to "correct" the transcript to be more interesting. You can undo and retry, or trim off extra text if it generates more than you want.

Randomness is turned on by default... (Read more)

Being highly skeptical of this GPT-3 "research" myself, let me make a meta-contrarian argument in favor of ways that we could do more constructive GPT-3 research, without letting the perfect be the enemy of the good.

One way is to try and develop semi-replicable "techniques" for training GPT-3, and quantifying their reliability.

So for example, imagine somebody comes up with a precise technical method for prompting GPT-3 to correctly classify whether or not parentheses are balanced or not, and also for determining stop conditions at which point the run will ... (read more)

2Icehawk787hThe main thing I've noticed is that most of the posts that are talking about its capabilities (or even what theoretical future entities might be capable of, based on a biased assumption of this version's capabilities) is that people are trying to figure out how to get it to succeed, rather than trying to get it to fail in interesting and informative ways. For example, one of the evaluations I've seen was having it do multi-digit addition, and discussing various tricks to improve its success rate, going off the assumption that if it can fairly regularly do 1-3 digit addition, that's evidence of it learning arithmetic. One null hypothesis against this would be "in its 350-700GB model, it has stored lookup tables for 1-3 digit addition, which it will semi-frequently end up engaging. The evaluation against a lookup table was to compare its success rate at 5+ digit numbers, and show that storing a lookup table for those numbers would be an increasingly large portion of its model, and then suggests that this implies it must be capable, sometimes, of doing math (and thus the real trick is in convincing it to actually do that). However, this ignores significantly more probable outcomes, and also doesn't look terribly closely at what the incorrect outputs are for large-digit addition, to try and evaluate *what* exactly it is the model did wrong (because the outputs obviously aren't random). I've also seen very little by the way of discussing what the architectural limitations of its capabilities are, despite them being publicly known; for example, any problem requiring deep symbolic recursion is almost certainly not possible simply due to the infrastructure of the model - it's doing a concrete number of matrix multiplications, and can't, as the result of any of those, step backwards through the transformer and reapply a particular set of steps again. On the plus side, this also means you can't get it stuck in an infinite loop before receiving the output.
13TurnTrout9hAs you say, highlist posts give biased impressions of GPT-3's capabilities. This bias remains even for readers who are consciously aware of that fact, since the underlying emotional impression may not adjust appropriately. So, for example, when I tell the reader that "only 30% of completions produced correct answers [https://www.lesswrong.com/posts/L5JSMZQvkBAx9MD5A/to-what-extent-is-gpt-3-capable-of-reasoning#Interview__4] ", that isn't the same as seeing the 70%-dumb answers. Another problem is that AIDungeon doesn't let you save the entire tree of edits, reversions, and rerolls. So, even if you link the full transcript, readers are still only getting the impressive version. If you wanted to overcome this, you'd have to bore readers with all of the stupid runs. No one wants to do that. I'm currently: * Explicitly noting where rerolls take place, or at least noting how many occurred for a given generation * Sampling the output distribution and giving qualitative summaries, particularly near things I'm claiming are impressive or cool * Interrogating the model with Story-modification * Including some runs where GPT-3 fails [https://www.lesswrong.com/posts/L5JSMZQvkBAx9MD5A/to-what-extent-is-gpt-3-capable-of-reasoning#Interview__4] I'd love to hear other suggested best-practices. For my part, I think a lot of questions I have about GPT-3 are, "is there a non-negligible chance it produces correct answers to fresh problems which seemingly require reasoning to solve?". So far, I'm very impressed at how often that has been true.

Pet peeve about privacy: I think people are woefully inadequate at asking, and answering, "Can you keep this confidential?"

Disclosure: I am not inherently great at keeping information private. By default, if a topic came up in conversation, I would accidentally sometimes say my thoughts before I had time to realize "oh, right, this was private information I shouldn't share."

I've worked over the past few years to become better at this – I've learned several specific skills and habits that make it easier. But I didn't learn those skills in school, and no one even really suggested I was supposed ... (Read more)

1lejuletre7hi haven't read all of the comments so idk if someone mentioned this further down, but there was a whole tumblr ordeal with this a few weeks back, and the conclusion that made the most sense to me was "don't share information about someone that could make them the victim of a hate crime," even if you think you know that the person you're about to disclose to would be a safe person. You don't know who that person in turn is going to share with. i struggle with the topic of this post a Lot, and the tumblr rule of thumb has been helpful for me.

Is this in the context of de-anonymization? 

This rule seems like it might make sense in some circles and circumstances, but taken at face value means basically never sharing any information about anyone, which seems too strong to be a general rule. I'm not 100% sure I understand the context. 

But, if someone is blogging on tumblr, pseudononymously, I do think it's probably default to not to sharing private info about them with most people.

Arguments against myopic training
4814d11 min readΩ 25Show Highlight

Note that this post has been edited to clarify the difference between explicitly assigning a reward to an action based on its later consequences, versus implicitly reinforcing an action by assigning high reward during later timesteps when its consequences are observed. I'd previously conflated these in a confusing way; thanks to Rohin for highlighting this issue.

A number of people seem quite excited about training myopic reinforcement learning agents as an approach to AI safety (for instance this post on approval-directed agents, proposals 2, 3, 4, 10 and 11 here, and this paper and pres... (Read more)

2rohinmshah11hThat seems to be a property of myopic cognition rather than myopic training? (See also this comment [https://www.lesswrong.com/posts/GqxuDtZvfgL2bEQ5v/arguments-against-myopic-training?commentId=xEeZ5rEJ2f28B2gFP] .)
4rohinmshah11hTo my understanding Abram, Richard and I agree that myopic cognition (what you're calling "internally myopically imitating") would confer benefits, but we don't think that myopic training is likely to lead to myopic cognition. That might be the crux?
4evhub11hSure, but imitative amplification can't be done without myopic training or it ceases to be imitative amplification and becomes approval-based amplification, which means you no longer have any nice guarantees about limiting to HCH.

What about imitating HCH using GAIL and AIRL? I wouldn't really call that myopic training (if you do, I'm curious what your definition of "myopic training" is).

Load More