This post is signal-boosting and recommending a strategy for improving your decision making that I picked up from the entrepreneur Ivan Mazour. He describes the process here, and publishes his ownresultseveryyearonhisblog.
In his words...
I believe that life is far too fast-paced for us to be able to make rational, carefully thought-out, decisions all the time. This completely contradicts my mathematical upbringing and training, but is something I have come to realise throughout my twenties. We need a way of keeping up with the constant barrage of decisions, even when the i
All of economics, every last bit of it, is about scarcity. About what is scarce and what is not, and about who does and who doesn’t get their needs (and sometimes wants) satisfied.
Much of the debate about healthcare is in fact a scarcity problem. There aren’t enough practitioners in practice to handle every patient, so some people don’t get doctors. It’s a combination of self-selection where people who can’t afford to take time off to have an ingrown toenail treated professionally hack at it with a pocketknife instead, and insurance selection where granny... (Read more)
Can you give a central example about a situation where more people receiving healthcare is worse, and why we should characterize that situation as one where more people receive healthcare?
If the government restricts the supply of meat (and food generally is adequately distributed), then the finite supply of meat makes it positional, and the fact that all needs are being met (within the scope of the example, at least) makes the outcome satisfactory.
If you thought that I intended some element of "maximize production even if all needs have already been met", then we have completely failed to communicate.
Here are six cases where I was pretty confident in my understanding of the microeconomics of something, but then later found out I was missing an important consideration.
Thanks to Richard Ngo and Tristan Hume for helpful comments.
Here’s the list of mistakes:
I thought divesting from a company had no effect on the company.
I thought that the prices on a prediction market converged to the probabilities of the underlying event.
I thought that I shouldn’t expect to be able to make better investment decisions than buying index funds.
I had a bad understanding of externalities, which was i
1Czynski42mIn an immediate but not useful sense, from your bank, because while the
calculations that make them determine this mortgage is a good investment involve
other actors, none of the interaction with other actors is instantaneous.
On the time scale of a month (and probably a week), partially from the company
who bought your mortgage for its risk-adjusted net present value (let's just
call that V), and the rest from 'nowhere', i.e. the amount they're now allowed
to lend by the fractional reserve banking regulations based on the fact that
their reserves have just increased by V. Trying to work out which of those
portions is bigger is making my head hurt, mostly because I'm pretty sure the
relative value of the risk-adjusted net present value of the mortgage's future
cash flow, compared with the reserve fraction and the actual face value of the
mortgage, matter, but I'm not quite sure how.
2romeostevensit40mYes, though the reserve requirement was recently dropped to zero.
1Czynski34mWat. Is there some more complicated picture, there, where they don't have to
maintain real reserves but they do need to hold something like overnight loans
from the Fed as reserves? Because if they're really untethered from reserves of
any kind, they can 'print' as much money as they want, which seems...
It's not necessarily a terrible idea, but it does entail that the central bank
give up its monopoly, and that seems like an insane move from the central bank's
perspective.
I’ve spent a lot of time developing tools and frameworks for bridging “intractable” disagreements. I’m also the person affiliated with CFAR who has taught Double Crux the most, and done the most work with it.
People often express to me something to the effect,
The important thing about Double Crux is all the low level habits of mind: being curious, being open to changing your mind, paraphrasing to check that you’ve understood, operationalizing, etc. The ‘Double Crux’ framework, itself is not very important.
I half agree with that sentiment. I do think that the low level cognitive and conversation... (Read more)
This comment helped me articulate something that I hadn't quite put my finger on before.
There are actually two things that I want to stand up for, which are, from naive perspective, in tension. So I think I need to make sure not to lump them together.
One the one hand, yeah, I think it is deeply true that you can unilaterally do the thing, and with sufficient skill, you can make "the Double Crux thing" work, even with a person who doesn't explicitly opt in for that kind of discourse (because curiosity and empathy are contagious, and many... (read more)
4elityre2hFor context, Mark participated in a session I ran via Zoom last weekend, that
covered this pattern.
For what its worth, that particular conversation is the main thing that caused
me to add a paragraph about distillation (even just as a bookmark), to the OP.
I'm not super confident what would have most helped there, though.
4elityre2hGar. I thought I finally got images to work on a post of mine.
4Raemon2hI live in the bubble that literally invented Doublecrux, so, it indeed is not
surprising if my experience doesn’t generalize.
In this competition, we (Ought) want to amplify Rohin Shah’s forecast for the question: When will a majority of AGI researchers agree with safety concerns? Rohin has provided a prior distribution based on what he currently believes, and we want others to:
Try to update Rohin’s thinking via comments (for example, comments including reasoning, distributions, and information sources). If you don’t want your comment to be considered for the competition, label it ‘aside’
Predict what his posterior distribution for the question will be after he has read all the comme
Does X agree that there is at least one concern such that we have not yet solved it and we should not build superintelligent AGI until we do solve it?
What counts as building superintelligent AGI?
This could mean anything from working on foundational theory which could be used to facilitate building an AGI, to finishing the final phase of training on a fully functional AGI implementation.
In the former case you're going to get close to 0% agreement. In the latter, well over 50% already (I hope!).
2elifland3hI think it's >1% likely that the one of the first few surveys Rohin conducted
would result in a fraction of >0.5.
Evidence from When Will AI Exceed Human Performance?
[https://arxiv.org/pdf/1705.08807.pdf#page=13], in the form of median survey
responses of researchers who published at ICML and NIPS in 2015:
* 5% chance given to Human Level Machine Intelligence (HLMI) having an
extremely bad long run impact (e.g. human extinction)
* Does Stuart Russell's argument for why highly advanced AI might pose a risk
point at an important problem? 39% say at least important, 70% at least
moderately important.
* But on the other hand, only 8.4% said working on this problem is now is more
valuable than other problems in the field. 28% said as valuable as other
problems.
* 47% agreed that society should prioritize "AI Safety Research" more than it
currently was.
These seem like fairly safe lower bounds compared to the population of
researchers Rohin would evaluate, since concern regarding safety has increased
since 2015 and the survey included all AI researchers rather than only those
whose work is related to AGI.
These responses are more directly related to the answer to Question 3 ("Does X
agree that there is at least one concern such that we have not yet solved it and
we should not build superintelligent AGI until we do solve it?") than Question 2
("Does X broadly understand the main concerns of the safety community?"). I feel
very uncertain about the percentage that would pass Question 2, but think it is
more likely to be the "bottleneck" than Question 3.
Given these considerations, I increased the probability before 2023 to 10%. I
moved the median | not never up to 2035 as a higher probability pretty soon also
means a sooner median. I decreased the probability of “never” to 20%, since the
“not enough people update on it / consensus building takes forever / the
population I chose just doesn't pay attention to safety for some reason”
condition seems
1Beth Barnes3hYeah I also thought this might just be true already, for similar reasons
2Benjamin Rachbach6hMy distribution [https://elicit.ought.org/builder/tBZpx5MVT]
My biggest differences with Rohin's prior distribution are:
1. I think that it's much more likely than he does that AGI researchers already
agree with safety concerns
2. I think it's considerably more likely than he does that the majority of AGI
researchers will never agree with safety concerns
These differences are explained more on my distribution and in my other
comments.
The next step that I think would help the most to make my distribution better
would be to do more research
[https://www.lesswrong.com/posts/Azqmzp5JoXJihMcr4/amplify-rohin-s-prediction-on-agi-researchers-and-safety?commentId=BjwHryhtEptfmoB5z]
.
Twitter thread by Eliezer Yudkowsky, with the bounty in bold:
So I don't want to sound alarms prematurely, here, but we could possibly be looking at the first case of an AI pretending to be stupider than it is. In this example, GPT-3 apparently fails to learn/understand how to detect balanced sets of parentheses.
Now, it's possibly that GPT-3 "legitimately" did not understand this concept, even though GPT-3 can, in other contexts, seemingly write code or multiply 5-digit numbers. But it's also possible that GPT-3, playing the role of John, predicted that *John* wouldn't learn it.
I don't think this is that crazy of a request. Many of the other fields of machine learning have robust visualizations that hazard at what the AI is "thinking." I haven't seen an equivalent thing for Transformer-based NLPs, but why not?
6ESRogs8hThis does seem like an interesting question. But I think we should be careful to
measure against the task we actually asked the system to perform.
For example, if I ask my system to produce a cartoon drawing, it doesn't seem
very notable if I get a cartoon as a result rather than a photorealistic image,
even if it could have produced the latter.
Maybe what this just means is that we should track what the user understands the
task to be. If the user thinks of it as "play a (not very smart) character who's
asked to do this task", they'll have a pretty different understanding of what's
going on than if they think of it as "do this task."
I think what's notable in the example in the post is not that the AI is being
especially deceptive, but that the user is especially likely to misunderstand
the task (compared to tasks that don't involve dialogues with characters).
2Vaniver5hConsider instead the scenario where I show you a photo of a face, and you
produce a photo of the side of the face. An interesting question is "is there a
3d representation of the face in the model?". It could be getting the right
answer that way, or it could be getting it some other way.
Similarly, when it models a 'dumb' character, is it calculating the right
answer, and then computing an error? Or is it just doing something dumb, which
incidentally turns out to be wrong?
Like, when you look at this example
[https://twitter.com/ESYudkowsky/status/1285663436299546624]:
How did it come up with 19 and 20? What would it take to make tools that could
answer that question?
13Kaj_Sotala9h"It's tempting to anthropomorphize GPT-3 as trying its hardest to make John
smart" seems obviously incorrect if it's explicitly phrased that way, but e.g.
the "Giving GPT-3 a Turing Test
[http://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.html]" post seems to
implicitly assume something like it:
The author says that this "stumps" GPT-3, which "doesn't know how to" say that
it doesn't know. That's as if GPT-3 was doing its best to give "smart" answers,
and just was incapable of doing so. But Nick Cammarata showed
[https://twitter.com/nicklovescode/status/1284050958977130497] that if you just
give GPT-3 a prompt where nonsense answers are called out as such, it will do
just that.
LessWrong has been around for 10+ years, CFAR's been at work for around 6, and I think there have been at least a few other groups or individuals working on what I think of as the "Human Rationality Project."
I'm interested, especially from people who have invested significant time in attempting to push the rationality project forward, what they consider the major open questions facing the field. (More details in this comment)
"What is the Rationality Project?"
I'd prefer to leave "Rationality Project" somewhat vague, but I'd roughly summarize it ... (Read more)
Even assuming you're correct, postrationalism won't help with any of that because it's nothing but systematized self-delusion. Rationality may not have benefits as huge as one'd naively expect, but it is still substantially better than deliberately turning your back on even attempting to be rational, which is what postrationalism does - intentionally!
So there’s this thing where GPT-3 is able to do addition, it has the internal model to do addition, but it takes a little poking and prodding to actually get it to do addition. “Few-shot learning”, as the paper calls it. Rather than prompting the model with
Q: What is 48 + 76? A:
… instead prompt it with
Q: What is 48 + 76? A: 124
Q: What is 34 + 53? A: 87
Q: What is 29 + 86? A:
The same applies to lots of other tasks: arithmetic, anagrams and spelling correction, translation, assorted benchmarks, etc. To get GPT-3 to do the thing we want, it helps to give it a few examples, so it can “figure out wh... (Read more)
6dxu3hGPT-3 and systems like it are trained to mimic human discourse. Even if (in the
limit of arbitrary computational power) it manages to encode an implicit
representation of human values somewhere in its internal state, in actual
practice there is nothing tying that representation to the phrase "human
values", since moral philosophy is written by (confused) humans, and in
human-written text the phrase "human values" is not used in the consistent,
coherent manner that would be required to infer its use as a label for a fixed
concept.
2John_Maxwell2hThis is essentially the "tasty ice cream flavors" problem, am I right? Trying to
check if we're on the same page.
If so: John Wentsworth said
So how about instead of talking about "human values", we talk about what a
particular moral philosopher endorses saying or doing, or even better, what a
committee of famous moral philosophers would endorse saying/doing.
2dxu3hOn "conceding the point":
The thesis that values are fragile doesn't have anything to do with how easy it
is to create a system that models them implicitly, but with how easy it is to
get an arbitrarily intelligent agent to behave in a way that preserves those
values. The difference between those two things is analogous to the difference
between a prediction task and a reinforcement learning task, and your argument
(as far as I can tell) addresses the former, not the latter. Insofar as my
reading of your argument is correct, there is no point to concede.
On gwern's article:
I'm not sure how to respond to this, except to state that neither this specific
claim nor anything particularly close to it appears in the article I linked.
On Tool AI:
As far as I'm aware, this point has never been the subject of much dispute.
This is still arguable; I have my doubts, but in a "big picture" sense this is
largely irrelevant to the greater point, which is:
This is (and remains) the crux. I still don't see how GPT-3 supports this claim!
Just as a check that we're on the same page: when you say "value-loading
problem", are you referring to something more specific than the general issue of
getting an AI to learn and behave according to our values?
***
META: I can understand that you're frustrated about this topic, especially if it
seems to you that the "MIRI-sphere" (as you called it in a different comment) is
persistently refusing to acknowledge something that appears obvious to you.
Obviously, I don't agree with that characterization, but in general I don't want
to engage in a discussion that one side is finding increasingly unpleasant,
especially since that often causes the discussion to rapidly deteriorate in
quality after a few replies.
As such, I want to explicitly and openly relieve you of any social obligation
you may have felt to reply to this comment. If you feel that your time would be
better spent elsewhere, please do!
The thesis that values are fragile doesn't have anything to do with how easy it is to create a system that models them implicitly, but with how easy it is to get an arbitrarily intelligent agent to behave in a way that preserves those values. The difference between those two things is analogous to the difference between a prediction task and a reinforcement learning task, and your argument (as far as I can tell) addresses the former, not the latter. Insofar as my reading of your argument is correct, there is no point to concede.
If you can solve the prediction task, you can probably use the solution to create a reward function for your reinforcement learner.
Welcome. This week we discuss the sixth section in the reading guide: Intelligence explosion kinetics. This corresponds to Chapter 4 in the book, of a similar name. This section is about how fast a human-level artificial intelligence might become superintelligent.
This post summarizes the section, and offers a few relevant notes, a... (Read more)
FWIW I think this 'milestone' is much less clear than Bostrom makes it sound. I'd imagine there's a lot of variation in fidelity of simulation, both measured in terms of brain signals and in terms of behaviour, and I'd be surprised if there were some discrete point at which everybody realised that they'd got it right.
Cross-posting from my personal blog, but written primarily for Less Wrong after recent discussion here.
There are whispers that the Efficient-Market Hypothesis is dead. Eliezer's faith has been shaken. Scott says EMH may have been the real victim of the coronavirus.
The EMH states that “asset prices reflect all available information”. The direct implication is that if you don’t have any non-available information, you shouldn’t expect to be able to beat the market, except by chance.
But some people were able to preempt the corona crash, without ... (Read more)
I'm probably missing something - but when you say the vast majority of active investors do really badly, shouldn't that be impossible too? If markets are truly efficient, isn't it just as hard to underperform them as outperform?
This post outlines a hierarchy of behavioral change methods. Each of these approaches is intended to be simpler, more light-weight, and faster to use (is that right?), than the one that comes after it. On the flip side, each of these approaches is intended to resolve a common major blocker of the approach before... (read more)
GPT-3 is captivating, and not just because of its potential reasoning abilities. This post will be a living collection of my favorite experiences with the network.
Bold text is my input; square brackets contain my commentary.
Long-form Writing
Beisutsukai Class Project: Solve Intent Alignment
"I've been thinking about this for far longer than you have. Building an AI is like building a person; it's not just a matter of throwing some parts together and suddenly you have a sentient being that does exactly what you want. AI design is a delicate art that requires hundreds of precise calibrations
We take the web for granted, but maybe we shouldn't. It's very large and nobody can read it all. There are many places we haven't been that probably have some pretty good writing. I wonder about the extent to which GPT-3 can be considered a remix of the web that makes it seem magical again, revealing aspects of it that we don't normally see? When I see writing like this, I wonder what GPT-3 saw in the web corpus. Is there an archive of Tolkien fanfic that was included in the corpus? An undergrad physics forum? Conversations about math and computer science?
I've been reading a fair bit about "worse than death" scenarios from AGI (e.g. posts like this), and the intensities and probabilities of them. I've generally been under the impression that the worst-case scenarios have extremely low probabilities (i.e. would require some form of negative miracle to occur) and can be considered a form of Pascal's mugging.
Recently, however, I came across this post on OpenAI's blog. The blog post notes the following:
Bugs can optimize for bad behavior
One of our code refactors introduced a bug which flipped the sign of the reward. Fl
Sorry for the dumb question a month after the post, but I've just found out about deceptive alignment. Do you think it's plausible that a signflipped AGI could fake being an FAI in the training stage, just to take a treacherous turn at deployment?
When EAs look at the history of nuclear weapons, their reactions tend to fall into two camps.
The first camp (which I am inclined towards) is “Man, what a total mess. There were so many near misses, and people involved did such clearly terrible and risky things like setting up the dead hand system and whatever else. I guess that humans probably can’t be trusted to handle extremely dangerous technology.”
The other camp says “No nuclear weapons have been used or detonated accidentally since 1945. This is the optimal outcome, so I guess this is evidence that humanity is ... (Read more)
2Lanrian9hThere's probably some misunderstanding, but I'm not immediately spotting it when
rereading. You wrote:
Going by the parent comment, I'm interpreting this as
* it = "we didn't observe nukes going off"
* X = "humans are competent at handling dangerous technology"
I think that
* SIA thinks that "we didn't observe nukes going off" is relatively stronger
evidence for "humans are competent at handling dangerous technology" (because
SIA ignores observer selection effects, and updates naively).
* SSA thinks that "we didn't observe nukes going off" is relatively weaker
evidence for "humans are competent at handling dangerous technology" (because
SSA doesn't update against hypothesis which would kill everyone).
Which seems to contradict what you wrote?
2Pongo9hYep, sorry, looks like we do disagree.
Not sure I'm parsing your earlier comment correctly, but I think you say "SIA
says there should be more people everywhere, because then I'm more likely to
exist. More people everywhere means I think my existence is evidence for people
handling nukes correctly everywhere". I'm less sure what you say about SSA,
either "SSA still considers the possibility that nukes are regularly mishandled
in a way that kills everyone" or "SSA says you should also consider yourself
selected from the worlds with no observers".
Do I have you right?
I say, "SIA says that if your prior is '10% everyone survives, 20% only 5%
survive, 70% everyone dies', and you notice you're in a 'survived' world, you
should think you are in the 'everyone survives' world with 90% probability (as
that's where 90% of the probability-weighted survivors are)".
3rohinmshah12hOkay, fair enough, though I want to note that the entire disagreement in the
post is about the backward-looking sense (if I'm understanding you correctly).
Like, the question is how to interpret the fact that there were a lot of near
misses, but no catastrophes (and for the class of nukes / bio etc. a 10%
catastrophe seems way more likely than a 90-100% catastrophe).
Okay, fair enough, though I want to note that the entire disagreement in the post is about the backward-looking sense (if I'm understanding you correctly).
Oh interesting! I suspect you understand me correctly and we disagree. To elaborate:
If it means something for humans to be "good at coordination", it's that there's some underlying features that cause humans to succeed rather than fail at coordination challenges. If I said someone was "good at winning poker games", I don't just mean that they happened to win once,... (read more)
In light of reading Hazard's Shortform Feed -- which I really enjoy -- based on Raemon's Shortform feed, I'm making my own. There be thoughts here. Hopefully, this will also get me posting more.
3Hazard5hOne way I think about things. Everything that I've found in myself and close
friends that looks and smells like "shoulds" is sorta sneaky. I keep on finding
shoulds which seem have been absorbed from others and are less about "this is a
good way to get a thing in the world that I want" and "someone said you need to
follow this path and I need them to approve of me". The force I feel behind my
shoulds is normally "You SCREWED if you don't!" a sort of vaguely panicy,
inflexible energy. It's rarely connected to the actual good qualities of the
thing I "should" be doing.
Because my shoulds normally ground out in "if I'm not this way, people won't
like me", if the pressure get's turned up, following a should takes me farther
and farther away from things I actually care about. Unblocking stuff often feels
like transcending the panicy fear that hides behind a should. It never
immediately lets me be awesome at stuff. I still need to develop a real
connection to the task and how it works into the rest of my life. There's still
drudgery, but it's dealt with from a calmer place.
3mr-hire7hI think removing internal conflicts is a "powerful but not sufficient."
The people who are most productive are also great at amplifying external
conflicts. That is, they have a clear, strong vision, and amplify the creative
tension between what they have and know they can have. This can help you do
things that are not "fun" like deliberate practice. but are totally aligned, in
that you have no objections to doing them, and have a stance of acceptance
towards the things that are not enjoyable.
The best then augment that with powerful external structures that are supportive
of their ideal internal states and external behaviors.
Each one of these taken far enough can be powerful, and when combined together
they are more than the sum of their parts.
1NaiveTortoise5hThanks, this framing is helpful for me for understanding how these things can be
seen to fit together.
If it’s worth saying, but not worth its own post, here's a place to put it. (You can also make a shortform post)
And, if you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are welcome.
4steve21528hYeah. There's no gradient descent within a single episode, but if you have a
network with input (as always) and with memory (e.g. an RNN) then its behavior
in any given episode can be a complicated function of its input over time in
that episode, which you can describe as "it figured something out from the input
and that's now determining its further behavior". Anyway, everything you said is
right, I think.
10Zack_M_Davis9hInterested. Be sure to check out Gwern's page on embryo selection
[https://www.gwern.net/Embryo-selection] if you haven't already.
7ChaseG6hThis post is an absolute gold mine. Thank you!
I am getting worried that people are having so much fun doing interesting stuff with GPT-3 and AI Dungeon that they're forgetting how easy it is to fool yourself. Maybe we should think about how many different cognitive biases are in play here? Here are some features that make it particularly easy during casual exploration.
First, it works much like autocomplete, which makes it the most natural thing in the world to "correct" the transcript to be more interesting. You can undo and retry, or trim off extra text if it generates more than you want.
Being highly skeptical of this GPT-3 "research" myself, let me make a meta-contrarian argument in favor of ways that we could do more constructive GPT-3 research, without letting the perfect be the enemy of the good.
One way is to try and develop semi-replicable "techniques" for training GPT-3, and quantifying their reliability.
So for example, imagine somebody comes up with a precise technical method for prompting GPT-3 to correctly classify whether or not parentheses are balanced or not, and also for determining stop conditions at which point the run will ... (read more)
2Icehawk787hThe main thing I've noticed is that most of the posts that are talking about its
capabilities (or even what theoretical future entities might be capable of,
based on a biased assumption of this version's capabilities) is that people are
trying to figure out how to get it to succeed, rather than trying to get it to
fail in interesting and informative ways.
For example, one of the evaluations I've seen was having it do multi-digit
addition, and discussing various tricks to improve its success rate, going off
the assumption that if it can fairly regularly do 1-3 digit addition, that's
evidence of it learning arithmetic. One null hypothesis against this would be
"in its 350-700GB model, it has stored lookup tables for 1-3 digit addition,
which it will semi-frequently end up engaging.
The evaluation against a lookup table was to compare its success rate at 5+
digit numbers, and show that storing a lookup table for those numbers would be
an increasingly large portion of its model, and then suggests that this implies
it must be capable, sometimes, of doing math (and thus the real trick is in
convincing it to actually do that). However, this ignores significantly more
probable outcomes, and also doesn't look terribly closely at what the incorrect
outputs are for large-digit addition, to try and evaluate *what* exactly it is
the model did wrong (because the outputs obviously aren't random).
I've also seen very little by the way of discussing what the architectural
limitations of its capabilities are, despite them being publicly known; for
example, any problem requiring deep symbolic recursion is almost certainly not
possible simply due to the infrastructure of the model - it's doing a concrete
number of matrix multiplications, and can't, as the result of any of those, step
backwards through the transformer and reapply a particular set of steps again.
On the plus side, this also means you can't get it stuck in an infinite loop
before receiving the output.
13TurnTrout9hAs you say, highlist posts give biased impressions of GPT-3's capabilities. This
bias remains even for readers who are consciously aware of that fact, since the
underlying emotional impression may not adjust appropriately. So, for example,
when I tell the reader that "only 30% of completions produced correct answers
[https://www.lesswrong.com/posts/L5JSMZQvkBAx9MD5A/to-what-extent-is-gpt-3-capable-of-reasoning#Interview__4]
", that isn't the same as seeing the 70%-dumb answers.
Another problem is that AIDungeon doesn't let you save the entire tree of edits,
reversions, and rerolls. So, even if you link the full transcript, readers are
still only getting the impressive version. If you wanted to overcome this, you'd
have to bore readers with all of the stupid runs. No one wants to do that.
I'm currently:
* Explicitly noting where rerolls take place, or at least noting how many
occurred for a given generation
* Sampling the output distribution and giving qualitative summaries,
particularly near things I'm claiming are impressive or cool
* Interrogating the model with Story-modification
* Including some runs where GPT-3 fails
[https://www.lesswrong.com/posts/L5JSMZQvkBAx9MD5A/to-what-extent-is-gpt-3-capable-of-reasoning#Interview__4]
I'd love to hear other suggested best-practices.
For my part, I think a lot of questions I have about GPT-3 are, "is there a
non-negligible chance it produces correct answers to fresh problems which
seemingly require reasoning to solve?". So far, I'm very impressed at how often
that has been true.
Pet peeve about privacy: I think people are woefully inadequate at asking, and answering, "Can you keep this confidential?"
Disclosure: I am not inherently great at keeping information private. By default, if a topic came up in conversation, I would accidentally sometimes say my thoughts before I had time to realize "oh, right, this was private information I shouldn't share."
I've worked over the past few years to become better at this – I've learned several specific skills and habits that make it easier. But I didn't learn those skills in school, and no one even really suggested I was supposed ... (Read more)
1lejuletre7hi haven't read all of the comments so idk if someone mentioned this further
down, but there was a whole tumblr ordeal with this a few weeks back, and the
conclusion that made the most sense to me was "don't share information about
someone that could make them the victim of a hate crime," even if you think you
know that the person you're about to disclose to would be a safe person. You
don't know who that person in turn is going to share with.
i struggle with the topic of this post a Lot, and the tumblr rule of thumb has
been helpful for me.
This rule seems like it might make sense in some circles and circumstances, but taken at face value means basically never sharing any information about anyone, which seems too strong to be a general rule. I'm not 100% sure I understand the context.
But, if someone is blogging on tumblr, pseudononymously, I do think it's probably default to not to sharing private info about them with most people.
Note that this post has been edited to clarify the difference between explicitly assigning a reward to an action based on its later consequences, versus implicitly reinforcing an action by assigning high reward during later timesteps when its consequences are observed. I'd previously conflated these in a confusing way; thanks to Rohin for highlighting this issue.
2rohinmshah11hThat seems to be a property of myopic cognition rather than myopic training?
(See also this comment
[https://www.lesswrong.com/posts/GqxuDtZvfgL2bEQ5v/arguments-against-myopic-training?commentId=xEeZ5rEJ2f28B2gFP]
.)
4rohinmshah11hTo my understanding Abram, Richard and I agree that myopic cognition (what
you're calling "internally myopically imitating") would confer benefits, but we
don't think that myopic training is likely to lead to myopic cognition. That
might be the crux?
4evhub11hSure, but imitative amplification can't be done without myopic training or it
ceases to be imitative amplification and becomes approval-based amplification,
which means you no longer have any nice guarantees about limiting to HCH.
What about imitating HCH using GAIL and AIRL? I wouldn't really call that myopic training (if you do, I'm curious what your definition of "myopic training" is).
Cool!