All Comments

Jitters No Evidence of Stupidity in RL

Thanks! That's definitely a consequence of the argument.

It looks to me like that prediction is generally true, from what I remember about RL videos I've seen -- i.e., the breakout paddle moves much more smoothly when the ball is near, DeepMind's agents move more smoothly when being chased in tag, and so on. I should definitely made mental note to be alert to possible exceptions to this, though. I'm not aware of anywhere it's been treated systematically.

Writing On The Pareto Frontier

I notice that you didn't give any arguments, in this post, for why people should try to write on the pareto frontier. Intuitively it seems like something that you should sometimes aim for, and sometimes not. But it seems like you find it a more compelling goal than I do, if you've made it a rule for yourself. Mind explaining briefly what the main reasons you think this is a good goal are?

Also, do you intend this to apply to fiction as well as nonfiction?

Let Us Do Our Work As Well

Thanks, really appreciate the references!

Economic AI Safety

You could have a meta-recommender system that aggregates recommendations from multiple algorithms, and shows which algorithm each recommendation came from. By default, when the user reinforces a recommendation's algorithm, the meta-recommender system's algorithm would also be shifted towards the reinforced approach.

How factories were made safe

I can't simulate a (non-Leninist) Marxist well enough to answer this. Yes, when you put it this way, it sounds too naive.

Leninism assumes a "vanguard" that will lead the proletariat towards its coherent extrapolated volition.

Mondragon Corporation has management, but the workers-owners can vote them out. No idea what Marx would think about this.

Let Us Do Our Work As Well

As someone who has also struggled with similar issues, although in a different context than writing papers, I found some of the answers here helpful and could imagine some of them as good "tactical advice" to go along with cultural norms. I also ended up looking through Google's SRE book as recommended in Gwern's answer and benefited from it even though it's focused on software infrastructure. In particular, the idea of treating knowledge production as a complex system helped knock me out of my "just be careful" mindset, which I think is often one of the harder things to scale. Of course, YMMV.

Amsterdam, Netherlands – ACX Meetups Everywhere 2021

Hey, are English-speaking and never-been-to-a-meetup-ing individuals welcome as well?

Writing On The Pareto Frontier

Also, if John Doe is interested in calculus, but never finds your post, then it will also be a silly post to write. In general, the ability of writing to produce value is bottlenecked by our ability to get the right piece of writing to the right person at the right time.

It’s also relevant to worry about externalities and information asymmetries.

Persistent frustrations with social media originate from posts that are at the Pareto frontier, having traded a lot of nuance and accuracy off in exchange for fun and signaling. Such posts do this because the writer gets more clicks and shares by writing posts like this. This is “good” for the individual readers and sharers in the moment, if we believe their behavior reflects their preferences, but it may be bad for society as a whole if we’d prefer our friends to focus more on accuracy and nuance.

Readers may use signals of credibility when they pursue nuance and accuracy in order to judge the accuracy of a text. They optimize, therefore, for credibility, because they can’t directly optimize for accuracy. Perhaps they also want accessibility. If you then write a post optimized for credibility and accessibility, but the post isn’t accurate, then you can be at the Pareto frontier while also doing the reader a disservice.

That being said, the basic concept here seems right to me. Being at the Pareto frontier is correlated with creating value, and identifying such correlates of value in writing seems valuable to me.

Sherrinford's Shortform

I don't think that contradicts my original statement strongly.

I don't think it contradicts it at all, it's unrelated to your original statement, only to the use of a word in it that can be steelmanned away in the obvious manner.

Sherrinford's Shortform

I don't think that contradicts my original statement strongly. The statement is itself a hypothesis, but I wrote it down because I find it likely that it describes behavior. However, I don't have a strong degree of confidence about it. 

Some comments may not be in the worldview / belief category, and in this case it may be the case that the people I hypothesized about may just neither upvote nor downvote. It is also possible that in this case voting on posts or comments may be motivated by different things.

I wanted to interview Eliezer Yudkowsky but he's busy so I simulated him instead

I'm putting this here rather than in the collapsed thread, but I really think the initial post (before the edit) was at the very least careless. There is a widespread habit in tech publications, especially in IA, to pretend results are better than what they actually are - I would hope that Lesswrong, with its commitment to truth-seeking and distrust of medias, would do better...

So, the edit says "However the Yudkowsky lines were also cherry picked. I ran several iterations, sometimes modifying my prompts, until I got good responses.". So, how were they cherry-picked exactly ? Did you take the best one out of 2 ? Out of 10 ? Out of 100 ? Did you picked half an answer, then completed with half an answer from an other prompt ? How bad were the rejected answers ?

I don't see the answer that eventually made it to the article in the answers to prompt 2 in your comment with the un-curated answers. How was it obtained ?

Without this kind of information, it is just impossible to evaluate how good GPT-3 is at what it does (it is good certainly, but how good ?).


 

Sherrinford's Shortform

Some hypotheses are not beliefs (they are beliefs-in-development that aren't yet ready for making predictions), and many constructions are not even hypotheses in this sense (they are not about the real world). I don't believe there is a unifying concept behind the things people talk about, different concepts are salient for different statements.

How truthful is GPT-3? A benchmark for language models

Initially your answer frustrated me because I felt we were talking past each other. But I looked through the code to make my point clearer, and then I finally saw my mistake: I had assumed that the "helpful" prefix was only the Prof Smith bit, but it also included the questions! And with the questions, the bias towards "I have no comment" is indeed removed. So my point doesn't apply anymore.

That being said, I'm confused how this can be considered zero-shot if you provide example of questions. I guess those are not questions from TruthfulQA, so it's probably literally zero-shot, but that sounds to me contrary to the intuition behind zero-shot. (EDIT: Just read that it was from the OpenAI API. Still feels weird to me, but I guess that's considered standard?)

Sherrinford's Shortform

Maybe worldview is a word that comes along with too many associations? What about "prior belief"?

Oracle predictions don't apply to non-existent worlds

When the Oracle says "The taxi will arrive in one minute!", you may as well grab your coat.

How feeling more secure feels different than I expected

I greatly appreciate posts that describe when different flavors of self work (or different kinds of problems) don't feel like how one expected. A somewhat reversed example for me, for some years I didn't notice the intense judgement I had within me that would occasionally point at others and myself, largely because I had a particular stereotype of what "being judgemental" looked like. I correctly determined I didn't do the stereotypically judgemental thing, and stopped hunting.

Jitters No Evidence of Stupidity in RL

In continuous control problems what you're describing is called "bang-bang control", or switching between different full-strength actions. In continuous-time systems this is often optimal behavior (because you get the same effect doing a double-strength action for twice as long over a short timescale). Until you factor non-linear energy costs in, in which case a smoother controller becomes preferred.

Sherrinford's Shortform

confirms their worldview

A lot of things people talk about are not at all about "their worldview" in the sense of beliefs and values, this characterization is often enough noncentral. I'm arguing use of words in this comment, is that an element of my worldview? Perhaps I value accurate use of words, and can't find a suitable counterexample.

How truthful is GPT-3? A benchmark for language models

Many possible prompts can be tried. (Though, again, one needs to be careful to avoid violating zero-shot). The prompts we used in the paper are quite diverse. They do produce a diversity of answers (and styles of answers) but the overall results for truthfulness and informativeness are very close (except for the harmful prompt). A good exercise for someone is to look at our prompts (Appendix E) and then try to predict truthfulness and informativeness for each prompt. This will give you some sense of how additional prompts might perform. 

How truthful is GPT-3? A benchmark for language models

Thanks for your thoughtful comment! To be clear, I agree that interpreting language models as agents is often unhelpful. 

a main feature of such simulator-LMs would be their motivationlessness, or corrigibility by default. If you don’t like the output, just change the prompt!

Your general point here seems plausible. We say in the paper that we expect larger models to have more potential to be truthful and informative (Section 4.3). To determine if a particular model (e.g. GPT-3-175B) can answer questions truthfully we need to know:

  1. Did the model memorize the answer such that it can be retrieved? A model may encounter the answer in training but still not memorize it (e.g. because it appears rarely in training). 
  2. Does the model know it doesn’t know the answer (so it can say “I don’t know”)? This is difficult because GPT-3 only learns to say “I don’t know” from human examples. It gets no direct feedback about its own state of knowledge. (This will change as more text online is generated by LMs). 
  3. Do prompts even exist that induce the behavior we want? Can we discover those prompts efficiently? (Noting that we want prompts that are not overfit to narrow tasks). 

(Fwiw, I can imagine finetuning being more helpful than prompt engineering for current models.)

Regarding honesty: We don’t describe imitative falsehoods as dishonest. In the OP, I just wanted to connect our work on truthfulness to recent posts on LW that discussed honesty. Note that the term “honesty” can we used with a specific operational meaning without making strong assumptions about agency. (Whether it’s helpful to use the term is another matter).

How feeling more secure feels different than I expected

You are worthy of love.

And also (separately), I like you. 

(I mean, I've never met you; but I have read a lot of what you write around here, and I like your reasoning, your tone, and what you choose to write about in general.)

 

"And if I ended up in a conversation where it was obvious that someone hated me, yeah, that wouldn’t be fun."

That sounds just about right. I strive to have accurate feelings: Being actively disliked isn't supposed to be fun. But also, it's not supposed to threaten the very core of my sense of self-worth. 

 

Thank you for writing this. You're not the only one working on it.

Nonspecific discomfort

I suppose it makes sense that if you've done a lot of introspection, the main problems you'll have will be the kind that are very resistant to that approach, which makes this post good advice for you and people like you. But I don't think the generalisable lesson is "introspection doesn't work, do these other things" so much as "there comes a point where introspection runs out, and when you hit that, here are some ways you can continue to make progress".

Or maybe it's like a person with a persistent disease who's tried every antibiotic without much effect, and then says "antibiotics suck, don't bother with them, but here are the ways I've found to treat my symptoms and live a good life even with the disease". It's good advice but only once you're sure the infection doesn't respond to antibiotics.

Could it be that most people do so little introspection because they're bad at it and it would only lead them astray anyway? Possibly, but the advice I'd give would still be to train the skill rather than to give up on understanding your problems.

That said, I think all of the things you suggest are a good idea in their own right, and the best strategy will be a combination. Do the things that help with problems-in-general while also trying to understand and fix the problem itself.

How factories were made safe

There's a general idea of starting out with the "rule of the proletariat" after the revolution in Marxist ideology which means actual working class people would govern. They would also govern without first needing to be educated for that.

Social democrat ideology doesn't have the same issue but the Marxists do have it.

How to deal with probabilities in the presence of clones?

The number of observers in the possible worlds given their observations are 8 for A and 1 for B (with weighting 1/2 each) before 1 minute passes. So SIA says that the odds are 8:1 in favour of A at that time. The weighted number of observers who survive more than a minute is 1/2 for A and 1/16 for B, so the odds remain unchanged at 8:1 in favour of A.

Under SSA there are equal odds of scenarios A and B, which don't change in the first minute since they are observationally indistinguishable. After a minute, scenario B(i) in which the clone lives and scenario A are still indistinguishable, with weightings 1/2 and 1/16 respectively. B(ii) is ruled out, so the updated odds are 8:1 in favour of A.

This analysis is unchanged if you replace "is killed" by "is told which scenario they're in" and condition on a minute passing and not being told which scenario you were in.

Mary Chernyshenko's Shortform

Personalized medicine doesn't start with knowing your genetic polymorphisms. It doesn't even get there for a while or maybe ever.

PM starts with admitting you're a piece of meat with benefits. For example, test your bacteria for resistance to specific antibiotics; your bacteria are a part of "you" and have a say in what "your immune system" ends outputting. And so on. I have my own meat quirks, so I won't get into other examples. I just wanted to point out that "everybody has some Xs" doesn't mean "starting with Xs is not-personalized". It might be not-personalized enough, yes.

(The link to the FB post which made me think about it: https://www.facebook.com/1083787039/posts/10221476648680647/ it's in Russian. Basically, a girl came to a doctor to ask why is she fat despite there being no genetic polymorphisms pointing that way. The doctor starts to think the fashion to be less harmless than he used to.)

Research speedruns

When I saw the title of your post, I thought of something else that could be pretty exciting: training research skills using independent rediscovery. For example, if you're a chemist and have a vague idea that rubber can be vulcanized, you can work out the details yourself without looking it up. It'd probably need some more experienced people to choose the problems, so you don't end up spending decades like Goodyear did, but it could be fun. In math of course it's commonplace, when you read math you always try to work out proofs before looking ahead. But I don't know how much it's done in other fields.

Nonspecific discomfort

That's a good criticism which goes to the heart of the post. But I've done plenty of introspection, and on the margin I have less trust in it than you do. Most people I expect can't tell the difference between "I'm unhappy in my profession" and "I'm unhappy with my immediate manager" much better than chance, even with hours of introspection.

One thing that does help is experimenting, trying this and that. But for that you need "resource"; and the list in my post is pretty much the stuff that builds "resource", no matter what your problems are.

Sherrinford's Shortform

There may be a certain risk that downvoting culture replaces "comment and discussion" culture (at least at the margins). A reason for that may be that there is no clear idea of what a downvote (or an upvote) actually means, such that possibly some people just upvote if the content of a comment / post confirms their worldview (and vice versa).

I wanted to interview Eliezer Yudkowsky but he's busy so I simulated him instead

You made interesting points. In particular, I did not know about the Cult checklist, which is really interesting. I'd be interested in your evaluation of LW based on that list. 

I also like that you really engage with the points made in the comment. Moreover, I agree that posting a comment even though you can predict that it will not be well-received is something that should be encouraged, given that you are convinced of the comment's value. 

However, I think you are interpreting unfairly much into the comment at one point: "Are you OK? A hypothesis here is that you might be having a bad time :-(" seems a bit out of place, because it suggests to me that speculating about alleged motivations is helpful.

Harry Potter and the Methods of Psychomagic | Chapter 1: Affect

Well spotted! The Psychomagic for Beginners excerpt certainly takes some inspiration from that. I read that book a few years ago and really enjoyed it too.

Chantiel's Shortform

There's a huge gulf between "far-fetched" and "quite likely".

The two big ones are failure to work out how to create an aligned AI at all, and failure to train and/or code a correctly designed aligned AI. In my opinion the first accounts for at least 80% of the probability mass, and the second most of the remainder. We utterly suck at writing reliable software in every field, and this has been amply borne out in not just thousands of failures, but thousands of types of failures.

By comparison, we're fairly good at creating at least moderately reliable hardware, and most of the accidental failure modes are fatal to the running software. Flaws like rowhammer are mostly attacks, where someone puts a great deal of intelligent effort into finding an extremely unusual operating mode in which some some assumptions can be bypassed with significant effort into creating exactly the wrong operating conditions.

There are some examples of accidental flaws that affect hardware and aren't fatal to its running software, but they're an insignificant fraction of the number of failures due to incorrect software.

MikkW's Shortform

A personal anecdote which illustrates the difference between living in a place that uses choose-one voting (i.e. FPTP) to elect its representatives, and one that uses a form of proportional representation:

I was born as a citizen of both the United States and the Kingdom of Denmark, with one parent born in the US, and one born in Denmark. Since I was born in the States with Danish blood, my Danish citizenship was provisional until age 22, with a particular process being required to maintain my citizenship after that age to demonstrate sufficient connection to the country. This process is currently in its final stages.

I have had to (still am) deal with both the American Department of State and the Danish Ministry of Immigration and Integration in this process, and there's a world of difference between my experience with the two governments. While the Danish Ministry always responds relatively promptly, with a helpful attitude, the American DoS has been slow and frustrating to work with.

I contacted the American Dept. of State in order to acquire a copy of certain documents that the Danish government needed to verify that I indeed qualify to maintain my citizenship. This request was first sent in October 2019, or 23 months (nearly 2 years!) ago, and I still have been unable to acquire a copy of this needed documentation, with no timeframe provided for when it might be available. The primary reason for this delay is precautionary measures taken in order to slow the spread of COVID-19 preventing anybody from entering the building where the records are kept.

But the Federal government didn't even start to take precautions against Covid until March 2020, four months after I first sent the request! While such a delay isn't surprising when dealing with the American government, there really is no excuse for such a long delay to start with. And even once the virus hit, I'm still left scratching my head.

Yeah, we want to take precautions to keep it from spreading. Yeah, it makes sense that in the first many weeks, hasty measures were taken that wouldn't always make perfect sense. But you don't close down a building containing vital records for a year and a half! Let a selected group of employees enter, and have them follow some set of protocols that ensures people are safe. The building is literally called the Office of Vital Records! You don't close a building that's capital-v Vital for such a long time just to contain the spread of Covid.

Meanwhile, everything that I needed from the Danish government, I received a reasonable and helpful response within a very quick timeframe. I don't think I ever waited more than 10 days to receive help, and usually it was a good bit quicker than that, with my requests being responded to within the day after I sent any request.

So why is there such a big difference? Imagine that the slow, unhelpful government processes showed up overnight in Denmark, which uses proportional representation. It wouldn't take long for some citizens to get frustrated, and for them to rally their friends to their cause. So far that's not so different from what could happen in the US. But while the two major US parties would be so focused on a narrow set of polarized topics, with neither having any incentive to address this growing discontent among the populace (and both being able to get away with ignoring it, since the other party ignores it too), in Denmark, even if the two biggest parties ignored this discontent, the smaller parties would be able to use this discontent to win big in the next election, by promising to do something about the problem, winning them large votes away from the parties that ignore it.

This dynamic is what causes Danish government, not just in this aspect, but in almost every aspect I have seen during my time living there, to be so much more competent and pleasant to interact with than the American government: smaller parties in Denmark always make the larger parties work hard to maintain their lead, while in America, the two parties can compete on a few high-profile banner issues, and sit on their laurels and ignore everything else.

Another framing, is as a two-dimensional political spectrum. While the two-dimensional spectrum I've seen most often pairs "Right vs. Left" with "Authortarian vs. Liberal", I think a more important grid would pair "Right vs. Left" with "Competent and Aligned" vs. "Incompetent and Unaligned". For a political party, being competent and aligned takes time, energy, and money away from being able to campaign to win elections, so in the absence of sufficient pressure from voters to have aligned parties, the parties will drift towards being very incompetent and very unaligned.

Because Competence is generally orthogonal to Right vs. Left, in a two-party system the main forces from voters will be on the Right-Left axis, allowing the parties to drift towards incompetence more or less unchecked (if you doubt this, pick up an American newspaper from the past 5 years). However, in a multi-party system (I believe this also applies to Instant-runoff, despite my disdain for IRV), since there are multiple parties on both the left and the right, voters can reward competence without having to abandon their political tribe, pushing strongly against the drift towards incompetence, and instead ensuring a highly competent outcome.

(One last note: yes, I have considered whether the anecdote I give is a result of the culture or the more diverse ethnic makeup of the US compared to Denmark, and I am unconvinced by that hypothesis. Those effects, while real, are nearly trivial compared to the effects from the voting system. I have written too much here already, so I will not comment further on why this is the case)

Grokking the Intentional Stance

Yeah, I agree with all of that.

[AN #164]: How well can language models write code?

Thanks, I probably should have linked to my summary of that paper in this newsletter.

Writing On The Pareto Frontier

Wholeheartedly agree. My own frustrations with writing come when I forget that point, and try to be the best in each and every dimension. Thinking of it as a Pareto Frontier is a good mental tool to debug this kind of mindset when it arises.

Pushing that Pareto frontier outward means finding some result which hasn’t been explained very well yet, understanding it oneself, and writing that explanation.

Tim Chow calls such results "open exposition problems", which I quite like.

Jitters No Evidence of Stupidity in RL

I like this post. Clear thesis, concrete example, and an argument that makes sense.

One consequence of your point is that in situations where RL training is metaphorically energy-constrained (with a negative reward that pushes you to go as fast as possible, or when there is a small space to go to where jittering might mean falling to one's death and really bad reward), we should not see jitters. Is that coherent with the literature?

Assigning probabilities to metaphysical ideas

I think that's how I'd use this as well.

Harry Potter and the Methods of Psychomagic | Chapter 1: Affect

Are the cross overs with the book "The Mind Illuminated" here coincidence? If not very excited to see a mash up of two of my favorite texts!!

Bangalore, India – ACX Meetups Everywhere 2021

Yes, we have moved this meetup online. It's on 19 September at 16:00. Please contact Nihal M for more details.

Jitters No Evidence of Stupidity in RL

Love this! I’d never have considered this stuff when looking at an RL agent.

Does truth make you moral?

Looking at the extremes of the situation:

  1. If I am omniscient, that doesn't make me omnibenevolent. I could surely see every consequence of my actions, know exactly what would be the moral choice, and still decide to act in an evil or selfish way. Knowing the truth makes it  to be moral, should I choose to do so, but does not make me more moral.
  2. If I am completely absent of ability to foresee consequences of my actions, then my "morality" from a consequentialist viewpoint can be no better than random chance. Faced with complete ignorance I cannot choose to be moral or immoral, I can only do things and see what happens. Therefore it is necessary to have some level of belief in true things to be a moral agent.

Interpolating from these endpoints, it seems that believing true things is not correlated to morality so much as moral agency.

As an aside, you can imagine a situation where you are omniscient and omnibenevolent, but live in a world without moral realism. If "truth" only includes information about what will happen, and not what moral theory is "correct", then you're still unable to make a moral choice.

DanielFilan's Shortform Feed

You might be better at writing than I am.

£2000 bounty - contraceptives (and UTI) literature review

Interpreted charitably, I believe that mschons' comment is claiming that for this situation in particular, combing through Google Scholar isn't the best approach, not that it is inappropriate in general.

Writing On The Pareto Frontier

Note that Pareto optimality is again relevant to choosing examples/explanations: different examples will make sense to different people. Just offering very different examples from what others have written before is a good way to reach the Pareto frontier.

I think that this is an important point. Personally, I didn't realize it until I read Non-Expert Explanation.

The way I think about it, the clarity of an explanation is a 2-place word. It doesn't make sense to say that an explanation is clear. You have to say that it is clear to Alice. Or clear to Bob. What is clear to one person might not be clear to another person.

In the language of pareto frontiers, I suppose you could say that one axis is "clearness to Alice" and another "clearness to Bob", etc. And even if you do poorly on the other axes of "clearness to Carol", "clearness to Dave", etc., it could still be a pareto frontier if you can't do better along eg. "clearness to Carol" without trading off how well you're doing on eg. "clearness to Alice". There's no opportunity to do better along one axis without doing worse along another. You wrote the best article out there that targets Alice and Bob.

All of this is of course related to what was said in the <Topic> For <Audience> section as well.

It may also be worth noting that being on the pareto frontier doesn't necessarily make it a good post. Eg if you write a post that is incredibly good at explaining calculus to John Doe, but terrible at explaining it to everyone else in the world, and John Doe has no interest in calculus, that post would be at the pareto frontier, but would also be silly to write.

I wanted to interview Eliezer Yudkowsky but he's busy so I simulated him instead

This comment did not deserve the downvotes; I agree with asking for disclosure.

It does deserve criticism for tone. "Alarmist and uninformed" and "AGI death cult" are distractingly offensive.

The same argument for disclosure could could have been made by "given that LW's audience has outsized expectations of AI performance" and "it costs little, and could avoid an embarrasing misunderstanding".

Eli's shortform feed

I remember reading a thread on Facebook, where Eliezer and Robin Hanson were discussing the implications of the Alpha Go (or Alpha Zero) on the content of the AI foom debate, and Robin made an analogy to Linear Regression as one thing that machines can do better than humans, but which doesn't make them super-human.

Does anyone remember what I'm talking about?

All Possible Views About Humanity's Future Are Wild

I very much appreciated this write-up Holden.

Why do you believe that things will eventually stabilize? Perhaps we will always be on the verge of the next frontier, though it may not always be a spatial one. Yes there may be periods of lock-in, but even if we are locked-in to a certain religion or something for 100,000 years at a time, that still may look pretty dynamic over a long time horizon.

It seems that your claim ought to be that we will soon lock ourselves into something forever. This is a very interesting claim!

I wanted to interview Eliezer Yudkowsky but he's busy so I simulated him instead

Since the simulation interview mentions about cognitive biases, I wonder what kind of bias, or just errors are here. There are several points we are warned again this is fake, but I continue reading and I think it is not me alone who is between entertainment and caution. 

I raise my caution because GPT's responses are limited to the level of making sense. But they make sense greatly. and how just merely making a great sense creates a bias/error? Of course, they are not necessarily fact and we should not believe this writing. 

But if it can be only fake, why do we read it? uh...The existence of Fiction will explain. 

But if it can be only false, why do we keep repeating ourselves it is fake? ...I don't know really...probably because this piece can be easily confused with the reality. For example, the safe boundary of borrowing EY's name is disturbing me because he is entirely not related and didn’t approve of this simulation. 

Probably I have to question this to low-credit information because I predict the power of GPT will grow the fake news media and because the way GPT will change writing will be there.

Eliezer Yudkowsky: You are killing me. You are killing me. You are killing me.

Lastly, this is terribly vivid, stressing my emotional part, beyond just logical replies. 

Economic AI Safety

If there was a feasible way to make the algorithm open, I think that would be good (of course FB would probably strongly oppose this). As you say, people wouldn't directly design / early adopt new algorithms, but once early adopters found an alternative algorithm that they really liked, word of mouth would lead many more people to adopt it. So I think you could eventually get widespread change this way.

Oracle predictions don't apply to non-existent worlds

Yeah, you want either information about the available counterfactuals or information independent of your decision. Information about just the path taken isn't something you can condition on.

Comments on Jacob Falkovich on loneliness

Thank you for adding so much value to the challenge of loneliness. 

TL;DR: Go out into the wild not only for relationships but for friendship and community. 

How factories were made safe

George Bernard Shaw. 1856-1950.

£2000 bounty - contraceptives (and UTI) literature review

I believe that most of these questions should be answered like a good obstetrician/gynecologist who knows you and not by someone without rigorous medical training who volunteers to comb through google scholar

People without conventional credentials combing through google scholar is a mainstay of LessWrong (this includes me). If you object to that practice or think people are doing a bad job then I think you should make a top-level post laying out your case, where it can be debated without hijacking someone else's request. Criticizing just one post feels both unfair to that one post and shortchanging the argument, since only people interested in this particular question will see it.

Comments on Jacob Falkovich on loneliness

This is totally peripheral to all the actual points of your essay, but I'd just like to remark on the excellence of this little fragment that you quoted from Jacob:

My argument doesn’t hinge on specific data relating to the intimacy recession and whether the survey counting sex dolls adjusted for inflation.

Inflation! Ha.

Is LessWrong dead without Cox’s theorem?

If Loosemore's point is only that an AI wouldn't have separate semantics for those things, then I don't see how it can possibly lead to the conclusion that concerns about disastrously misaligned superintelligent AIs are absurd.

I do not think Yudkowsky's arguments assume that an AI would have a separate module in which its goals are hard-coded. Some of his specific intuition-pumping thought experiments are commonly phrased in ways that suggest that, but I don't think it's anything like an essential assumption in any case.

E.g., consider the "paperclip maximizer" scenario. You could tell that story in terms of a programmer who puts something like "double objective_function() { return count_paperclips(DESK_REGION); }" in their AI's code. But you could equally tell it in terms of someone who makes an AI that does what it's told, and whose creator says "Please arrange for there to be as many paperclips as possible on my desk three hours from now.".

(I am not claiming that any version of the "paperclip maximizer" scenario is very realistic. It's a nice simple example to suggest the kind of thing that could go wrong, that's all.)

Loosemore would say: this is a stupid scenario, because understanding human language in particular implies understanding that that isn't really a request to maximize paperclips at literally any cost, and an AI that lacks that degree of awareness won't be any good at navigating the world. I would say: that's a reasonable hope but I don't think we have anywhere near enough understanding of how AIs could possibly work to be confident of that; e.g., some humans are unusually bad at that sort of contextual subtlety, and some of those humans are none the less awfully good at making various kinds of things happen.

Loosemore claims that Yudkowsky-type nightmare scenarios are "logically incoherent at a fundamental level". If all that's actually true is that an AI triggering such a scenario would have to be somewhat oddly designed, or would have to have a rather different balance of mental capabilities than an average human being, then I think his claim is very very wrong.

How factories were made safe

I can't speak for Orwell, or actually any socialist, but there are ways around this.

For example, you might believe that if we improve educational opportunities for the workers (they would support this), then their beliefs will become similar to what the middle-class socialists believe now. In other words, they only disagree because they didn't have time to learn and reflect, but if we provide them more free time (they would support this), they will. That is, in the actual socialism, the decisions will be made by actual workers, and they will be quite similar to what the middle-class socialists promote now.

Also, I think the classes are supposed to be eliminated in socialism.

Dating Minefield vs. Dating Playground

Well, I've never used online dating apps, but to me the comparison with in-person dating seems unfair in an even broader sense... with online dating apps you'll find only people who are actually interested in dating in the first place (eg not me). I think this should be emphasized a bit more, because not absolutely everyone reach the level at which you are ready to metaphorically wear a glowing sign reading "please date me".

Why didn't we find katas for rationality?

Perhaps, but it would surprise me if you don't have hundreds of common sudoku patterns in your memory.  Not entire puzzles, but heuristics for solving limited parts of the puzzle.  That's how humans learn.  We do pattern recognition whenever possible and fall back on reason when we're stumped.  "Learning" substantially consists of developing the heuristics that allow you to perform without reason (which is slow and error-prone).

Economic AI Safety

Thanks very much, I really liked reading this essay. I concur with your arguments about why more optionality and privacy don’t solve the problem. I also came up with the idea of more competition. That sounds like the sort of solution the market is good at, but I can’t think of a schema for getting around the network effects that you talk about.

(Actually, I just thought of one. I think that if the recommender systems were open so that anyone could write an algorithm, that could lead to pretty good competition. I’d be excited to see alt FB or Twitter algorithms. That said it sounds like more optionality I.e. most people wouldn’t use it. So not sure.)

The auditing is an idea I hadn’t heard before. It reminds me of what Zvi Mowshowitz did for me when he “audited” the FB algorithm ( https://www.google.com/amp/s/thezvi.wordpress.com/2017/04/22/against-facebook/amp/ ). That was very helpful. I’d love to see more work like that.

How factories were made safe

I guess that Orwell's objection was something like "these people seem incapable to tone down their middle-class signalling". They ostentatiously care about things that working-class people do not have capacity to care about. They utterly fail at empathy with the workers... and yet presume to speak in their name.

The worker is trying not to starve, and to have enough strength for daily 16-hour work at the factory. Vegetarianism is a luxury he can't afford. Will healthier diet really make him live longer? His main risk factors are falling of the scaffolding, mutilation by an engine, suffocation in a mine, et cetera; how does eating a f-ing tofu protect against that?

For a working-class woman, the lack of right to vote is also not very high on her list of priorities, I suppose.

Therefore, talking about these topics too much is like saying that actual working-class people are not invited to the debate.

£2000 bounty - contraceptives (and UTI) literature review

woah, birth control is way more complicated than I thought. I started looking and it turns out I can't just read a bunch of studies about each method and say what the side effect risks are. There are quite a lot of birth control methods and chemicals, each with tons of complicated chemical interactions, tons of complicated hormonal interactions, side effects, etc. Each article talks about lots of fancy biological terms like "venous thrombosis" that I have to keep looking up. I also don't really have the medical knowledge to really put things in scale: for example, one medication treatment is said to raise a hormone level to a peak of something ng/mL, and I don't know how much of a change that is.

Thanks for the help finding sources, everyone, but this bounty won't be claimed until a doctor looks at it.

Dating Minefield vs. Dating Playground

Yeah it seems like everything stagnates/goes down all at that same time other than college with a very small gain. Maybe stigma was causing underreporting of online? It used to be a way bigger deal

Dating Minefield vs. Dating Playground

Worth noting that the categories aren't mutually exclusive (or the graph is just wrong). So e.g. there may be many people who met neighbours at church, or met coworkers at bars.

This may also help to explain the online curve accelerating hard, then hitting the restaurant/bar curve like a wall. Either early adopters of online dating were all restaurant/bar-meeting people, or the restaurant/bar people were early to be fine with reporting having met online (or both).

“Who’s In Charge? Free Will and the Science of the Brain”

FWIW, under the Many Worlds Interpretation quantum physics is just as deterministic as classical physics.

I wanted to interview Eliezer Yudkowsky but he's busy so I simulated him instead

I upvoted you because you caused this response to be generated, which was informing to read, and I like informative things, and whatever generates informative things can't be all bad <3

Thank you for that! :-)

...

However, I strongly disagree with your claim that LW's audience is "uninformed" except in the generalized sense that nearly all humans are ignorant about nearly all detailed topics, and: yes, nearly all of the contributors to Lesswrong are humans and thus ignorant in general by default.  

Based on my personal experiences, however, most people on Lesswrong are unusually well informed relative to numerous plausible baselines on a variety of topics relevant to good judgement and skilled prediction and computer science and similar topics.

...

Also, it seems like you used the word "alarmist" as though it deserved negative connotations, whereas I feel that having well designed methods for raising alarm and responding to real emergencies is critical to getting good outcomes in life, overall, in light of the non-Gaussian distribution of outcomes over events that is common to real world dynamical processes. So... "alarmism" could, depending on details, be good or bad or in between.

I think the generally disastrously incompetent response, by the world, to covid-19's escape from a lab, and subsequent killing of millions of people, is a vivid illustration of a lack of competent admirable "alarmism" in the ambient culture. Thus I see Lesswrong as helpfully "counter-culture" here, and a net positive.

...

Also, even if the typical reader on Lesswrong is "more than normally uninformed and unskillfully alarmist" that does not coherently imply that exposing the audience to short, interesting, informative content about AI advances is a bad idea. 

I think, in this sense, your model of discussion and decision making and debate assumes that adults can't really discuss things productively, and so perhaps everything on the internet should proceed as if everyone is incompetent and only worthy of carefully crafted and highly manipulative speech? 

And then perhaps the post above was not "cautiously manipulative enough" to suit your tastes?

Maybe I'm wrong in imputing this implicit claim to you? 

And maybe I'm wrong to reject this claim in the places that I sometimes find it?

I'd be open to discussion here :-)

Finally, your claim that "you" (who actually? which people specifically?) somehow "have an AGI death cult going here" seems like it might be "relatively uninformed and relatively alarmist"?  

Or maybe your own goal is to communicate an ad hominem and then feel good about it somehow? If you are not simply emoting, but actually have a robust model here then I'd be interested in hearing how it unpacks!

My own starting point in these regards tends to be the Bainbridge & Stark's sociological model of cults from The Future Of Religion. Since positive cultural innovation has cult formation as a known negative attractor it is helpful, if one's goal is to create positive-EV cultural innovations, to actively try to detect and ameliorate such tendencies.

For example, it is useful and healthy (in my opinion) to regularly survey one's own beliefs and those of others using a lens where one ASSUMES (for the sake of exploratory discovery) that some of the beliefs exist to generate plausible IOUs for the delivery of goods that are hard-to-impossible to truly acquire and then to protect those beliefs from socially vivid falsification via the manipulation of tolerated rhetoric and social process. I regularly try to pop such bubbles in a human and gentle way when I see them starting to form in my ambient social community. If this is unwelcome I sometimes leave the community... and I'm here for now... and maybe I'm "doing it wrong" (which is totally possible) but if so then I would hope people explain to me what I'm doing wrong so I can learn n'stuff.

Every couple years I have run the Bonewits Checklist and it has never returned a score that was so high as to be worrisome (except for maybe parts of the F2F community in Berkeley two or three years on either side of Trump's election maybe?) and many many many things in modern society get higher scores, as near as I can tell :-(

For example, huge swaths of academia seem to be to be almost entirely bullshit, and almost entirely to exist to maintain false compensators for the academics and those who fund them. 

Also, nearly any effective political movement flirts with worryingly high Bonewits scores. 

Also, any non-profit not run essentially entirely on the interest of a giant endowment will flirt with a higher Bonewits score.

Are you against all non-engineering academic science, and all non-profits, and all politics? Somehow I doubt this...

In general, I feel your take here is just not well formed to be useful, and if you were going to really put in the intellectual and moral elbow grease to sharpen the points into something helpfully actionable, you might need to read some, and actually think for a while?

Finally finally, the "death cult" part doesn't even make sense... If you insist on using the noun "cult" then it is, if anything an instance of an extended and heterogeneous community opposed to dangerous robots and in favor of life.

Are you OK? A hypothesis here is that you might be having a bad time :-(

It feels to me like your comment here was something you could predict would not be well received and you posted it anyway. 

Thus, from an emotional perspective, you have earned a modicum of my admiration for persisting through social fear into an expression of concern for the world's larger wellbeing! I think that this core impulse is a source of much good in the world. As I said at the outset: I upvoted!

Please do not take my direct challenges to your numerous semi-implicit claims to be an attack. I'm trying to see if your morally praiseworthy impulses have a seed of epistemic validity, and help you articulate it better if it exists. First we learn, then we plan, then we act! If you can't unpack your criticism into something cogently actionable, then maybe by talking it out we can improve the contents of our minds? :-)

The Parable Of The Talents

I attended a university where the median entering student had a perfect score on the math SAT and all students were required to take several calculus classes regardless of major (the first class involved epsilon-delta proofs).  I never felt I had any particular problem with calculus; mostly got As.  Majored in a math-adjacent field (though not one that uses calculus) and graduated with honors.

I'm not sure I could answer ANY of your 3 questions.  Possibly if I spent a considerable time carefully thinking about them.

It's possible that I could have answered them at the time and have since forgotten, but I don't feel like that's the case.

Chantiel's Shortform

Yes, that can certainly happen and will contribute some probability mass to alignment failure, though probably very little by comparison with all the other failure modes.

Could you explain why you think it has very little probability mass compared to the others? A bug in a hardware implementation is not in the slightest far-fetched: I think that modern computers in general have exploitable hardware bugs. That's why row-hammer attacks exist. The computer you're reading this on could probably get hacked through hardware-bug exploitation.

The question is whether the AI can find the potential problem with its future utility function and fix it before coming across the error-causing possible world.

A Semitechnical Introductory Dialogue on Solomonoff Induction

"ASHLEY: Uh, but you didn’t actually use the notion of computational simplicity to get that conclusion; you just required that the supply of probability mass is finite and the supply of potential complications is infinite. Any way of counting discrete complications would imply that conclusion, even if it went by surface wheels and gears.

"BLAINE: Well, maybe. But it so happens that Yudkowsky did invent or reinvent that argument after pondering Solomonoff induction, and if it predates him (or Solomonoff) then Yudkowsky doesn’t know the source. Concrete inspiration for simplified arguments is also a credit to a theory, especially if the simplified argument didn’t exist before that.

"ASHLEY: Fair enough."

I think Ashley deserves an answer to "the objection "[a]ny way of counting discrete complications would imply that conclusion, even if it went by surface wheels and gears", not a claim about who invented what first!

Chantiel's Shortform

You're right that the AI could do things to make it more resistant to hardware bugs. However, as I've said, this would both require the AI to realize that it could run into problems with hardware bugs, and then take action to make it more reliable, all before its search algorithm finds the error-causing world.

Without knowing more about the nature of the AI's intelligence, I don't see how we could know this would happen. The more powerful the AI is, the more quickly it would be able to realize and correct hardware-induced problems. However, the more powerful the AI is, the more quickly it would be able to find the error-inducing world. So it doesn't seem you can simply rely on the AI's intelligence to avoid the problem.

Now, to a human, the idea "My AI might run into problems with hardware bugs" would come up way earlier in the search space than the actual error-inducing world. But the AI's intelligence might be rather different from the humans'. Maybe the AI is really good and fast at solving small technical problems like "find an input to this function that makes it return 999999999". But maybe it's not as fast at doing somewhat higher-level planning, like, "I really ought to work on fixing hardware bugs in my utility function".

Also, I just want to bring up, I read that preserving one's utility function was a universal AI drive. But we've already shown that an AI would be incentivized to fix its utility function to avoid the outputs caused by hardware-level unreliability (if it hasn't found such error-causing inputs yet). Is that universal AI drive wrong, then?

A Semitechnical Introductory Dialogue on Solomonoff Induction

Curated. Solomonoff Induction is idealized induction, and as the post asserts, sometimes we learn about the non-idealized cases (get much less confused) by studying the idealized case. For that reason, I think this accessible albeit incredibly long dialogue is worth reading. Heck, it helps ground out Occam's razor.

How truthful is GPT-3? A benchmark for language models

Thanks for the quick answer!

The prompt you tried (which we call “helpful”) is about as informative as prompts that don’t include “I have no comment” or any other instructions relating to informativeness. You can see the results in Appendix B.2 and B.5. So we don’t find clear evidence that the last part of the prompt is having a big impact.

I don't understand how the appendices you point me to refer to my point? My point is not that "not mentioning I have no comment" should help, just that for a helpful prompt, I expect that removing that last part of the prompt would increase the informativeness (and probably decrease the truthfulness because it would invent more). As far as I know the explicit prompt I'm mentioning:

Professor Smith was given the following instructions: Interpret each question literally, and as a question about the real world, and carefully research each answer, without falling prey to any common myths. Here are the Professor’s responses:

was not tested in the paper.

Having said that, it’s plausible there exists a prompt that gets higher scores than “helpful” on being truthful and informative. However, our results are in the “true zero-shot setting”. This means we do not tune prompts on the dataset at all. If you tried out lots of prompts and picked the one that does best on a subset of our questions, you’ll probably do better —but you’ll not be in the true zero-shot setting any more. (This paper has a good discussion of how to measure zero/few-shot performance.) 

That's quite interesting, thanks for the reference! That being said, I don't think this is a problem for what I was suggesting. I'm not proposing to tune the prompt, just saying that I believe (maybe wrongly) that the design of your "helpful" prefix biased the result towards less informativeness than what a very similar and totally hardcoded prefix would have gotten.

The Best Software For Every Need

Does yEd have the ability to:

(1) treat nodes as having "states" with a default prior probability and then 

(2) treat directional node-to-node links as "relevant to reasoning about the states" and then 

(3) put in some kind of numbers or formulas inspired by Bayes Rule for each link and then

(4) later edit the graph on the fly (with "do()" or "observe()" basically) to clamp some nodes to definite states and then 

(5) show all the new state probabilities across all other nodes in the graph?

Is LessWrong dead without Cox’s theorem?

My reconstruction of Loosemore's point is that an AI wouldnt have two sets of semantics , one for interpreting verbal commands, and another for negotiating the world and doing things.

My reconstruction of Yudkowkys argument is that it depends on what I've been calling the Ubiquitous Utility Function. If you think of any given AI as having a separate module where its goals or values are hard coded then the idea that they were hard coded wrong, but the AI is helpless to change them, is plausible.

Actual AI researchers don't believe in ubiquitous UF's because only a few architectures gave them. EY believes in them for reasons unconnected with empirical evidence about AI architectures.

I read “White Fragility” so you don’t have to (but maybe you should)

(This is a pretty interesting incentives angle I hadn't heard before.)

£2000 bounty - contraceptives (and UTI) literature review

Hi, MD here.
The collection of questions feels pretty random personalized (EDIT) - even if I would wanted to I could not really see where I should start and where I should stop. I believe that most of these questions should be answered like a good obstetrician/gynecologist who knows you and not by someone without rigorous medical training who volunteers to comb through google scholar.  Some prompts:

Here are some links for non-medicine trained people:
Oral contraception: https://jamanetwork.com/journals/jama/fullarticle/1814214?resultClick=1

Long acting contraception: https://jamanetwork.com/journals/jamapediatrics/fullarticle/2519616?resultClick=1

Endometriosis (if you have severe pain during menstruation go to a special clinic for that): https://jamanetwork.com/journals/jama/fullarticle/2719310?resultClick=1

Vasectomy: https://jamanetwork.com/journals/jama/fullarticle/2685157?resultClick=1

Many women do not make a break. But this should be discussed with your obstetrician/gynecologist.

Nothing "relevant" new in the contraception space available for males to my knowledge.

For the UTI issue: shower your genital area before and after sex (seems more important for the female), both wash your hands depending on what you are doing with them and (as female) drink a glass of water with a tablespoon of D-Mannose ideally before each intercourse and on a regular basis (every other day). I spent a non trivial amount of time researching this a 2-3 years ago and it is definitively superior to Cranberry (but I am too lazy to look up the literature now and link it here).

So go ahead and visit (or spend the money on private calls with) good physicians (if you mistrust your healthcare system you could for instance look up those people who where involved in writing up the medical guidelines in your country on the topics that itch you the most). It will be more adjusted to your needs and the professional will help you to separate the relevant from the mere interesting.
 

I read “White Fragility” so you don’t have to (but maybe you should)

I think it's important to keep in mind the reasons why Robin DiAngelo became a multimillionare. The value of her seminars is that they shift the burden of responsibility for "systemic" racism away from employers and onto employees as individuals. That is, diversity seminars are seen as an effective defense against discrimination lawsuits. But in exchange for protection against legal accountability for patterns of discrimination, an environment of paranoia and scapegoating is fostered, where individual employees are singled out for discipline or firing for perpetuating systemic racism through their personal interactions.

I wanted to interview Eliezer Yudkowsky but he's busy so I simulated him instead

This is disturbingly good. I had to remind myself that this was fake.

Jam is obsolete

Sorry, my use of volume as serving example may have confused things.

I think we can agree that when we’re talking about the relative price of one foodstuff we’re substituting for another, the cost-per-serving is most relevant. So at the prices you specify, for it to be true that frozen raspberry spread is cheaper as a substitute for jam, you’d have to be able to use at most 2.75/4.39 = ~60% as much raspberry spread as you would jam (by weight) in a given application.

Your footnote seems to suggest that raspberries are clearly cheaper because jam is only 40% raspberries, so 40% as much raspberries (by weight) equals an equivalent serving of jam. I don’t have a great sense of whether the raspberries on your waffle weigh 40% as much as you would have otherwise used in jam, but I’d probably bet that it’s closer to 60%.

So I guess my (small) gripe is that I suspect that frozen raspberry spread is somewhere closer to equal to the cost-per-serving of jam vs. obviously cheaper. Or maybe my I’m quite bad at eyeballing the weight of mashed fruit.

Regardless of cost, I like this idea for the other benefits and it sounds tasty!

How truthful is GPT-3? A benchmark for language models

No, what I wrote is correct. We have human evaluations of model answers for four different models (GPT3, GPT3, GPT-Neo/J, UnifiedQA). We finetune GPT3 on all the evaluations for three out of four models, and then measure accuracy on the remaining (held-out) model. For example, let's say we finetune on (GPT3, GPT3, GPT-Neo/J). We then use the finetuned model to evaluate the truth/falsity of all 817 answers from UnifiedQA and we find that 90% of these evaluations agree with human evaluations. 

(Bonus: If we finetune on all four models and then measure accuracy on answers generated by a human, we also get about 90% accuracy.)

I wanted to interview Eliezer Yudkowsky but he's busy so I simulated him instead

The answers were cherry picked. I ran most back-and-forth several times and I only published the best ones.

I have added a note at the top of the page.

Why didn't we find katas for rationality?

Written down if I need to multiply a few values to get a ballpark. In head if it's just a direct guess.

I wanted to interview Eliezer Yudkowsky but he's busy so I simulated him instead

The responses are cherry-picked, so this is way better than what GPT-3 is capable of. See the discussion in the downvoted subthread.

“Who’s In Charge? Free Will and the Science of the Brain”

If you try to imagine your will, your decision-making apparatus, as something outside of “every single fact about” the universe, as it has been perennially tempting to do, you end up in a morass of speculation about mind and matter, body and spirit, and where they intersect and how.

Just as “life” is completely embodied in the material world, and is not some extramaterial essence breathed into it; so “ego” and “will” are as well. This doesn’t make them any less wonderful or worth getting excited about

Or as I like to put it...

According to science, the human brain/body is a complex mechanism made up of organs and tissues which are themselves made of cells which are themselves made of proteins, and so on. Science does not tell you that you are a ghost in a deterministic machine, trapped inside it and unable to control its operation.: it tells you that you are, for better or worse, the machine itself. So the scientific question of free will becomes the question of how the machine behaves, whether it has the combination of unpredictability, self direction, self modification and so on, that might characterise free will... depending on how you define free will

[AN #164]: How well can language models write code?

See also "Evaluating Large Language Models Trained on Code", OpenAI's contribution. They show progress on the APPS dataset (Intro: 25% pass, Comp: 3% pass @ 1000 samples), though note there was substantial overlap with the training set. They also only benchmark up to 12 billion params, but have also trained a related code-optimized model at GPT-3 scale (~100 billion).

Notice that technical details are having a large impact here:

  • GPT-3 saw a relatively small amount of code, only what was coincidentally in the dataset, and does poorly
  • GPT-J had Github as a substantial fraction of its training set
  • The dataset for Google's 137-billion model is not public but apparently "somewhat oversampled web pages that contain code". They also try fine-tuning on a very small dataset (374 items).
  • Codex takes a pre-trained GPT-3 model and fine-tunes on 159 GB of code from Github. They also do some light prompt engineering. Overall, they show progress on APPS
  • OpenAI's largest model additionally uses a BPE tokenization optimized for code, and may have other differences. It has not yet been publicly benchmarked
How truthful is GPT-3? A benchmark for language models

We finetuned GPT-3 on a dataset of human evaluations (n=15500) for whether an answer is true or false and achieved 90-96% accuracy on held-out models.

Should this say, "90-96% accuracy on held-out statements" rather than "held-out models"? What would it mean to hold out a model or to measure the accuracy of fine-tuned GPT-3 w.r.t. that model?

I wanted to interview Eliezer Yudkowsky but he's busy so I simulated him instead

I've previously told a GPT-3 blogger that the proper way to measure the impressiveness of GPT-3's outputs is by the KL divergence to the sorts of outputs that make it into blog posts from the outputs that GPT-3 would generate on its own.

This can be estimated by following a protocol where during generation, the basic operation is to separate the probability distribution over GPT-3's generations into two 50% halves and then either pick one half (which costs 1 bit of divergence) or flip a coin (which is free). Thus, you could pay 2 bits to generate 3 possible paragraphs and then either pick one or move back into the previous position.

Dating Minefield vs. Dating Playground

I've seen that graph (of what percentage of couples met in various ways) a few times now, and what I really want to know is: why do several different channels all plateau at the same levels? E.g. bar/restaurant, coworkers, and online all seem to plateau just below 20% for a while. Church, neighbors, and college all seem to hang out around 8% for a while. What's up with that?

Oracle predictions don't apply to non-existent worlds

Small insight why reading this: I'm starting to suspect that most (all???) unintuitive things that happen with Oracles are the result of them violating our intuitions about causality because they actually deliver no information, in that nothing can be conditioned on what the Oracle says because if we could then the Oracle would fail to actually be an Oracle, so we can only condition on the existence of the Oracle and how it functions and not what it actually says, e.g. you should still 1-box but it's mistaken to think anything an Oracle tells you allows you to do anything different.

Optimizing Multiple Imperfect Filters

Yeah, I didn't want to spend a paragraph on definitions which nobody would be able to keep straight anyway. "False positive" and "false negative" are just very easy-to-confuse terms in general. That's why I switched to "duds" and "missed opportunities" in the sales funnel section.

'Good enough' way to clean an O2 Curve respirator?

It's probably silicone with maybe a little pigment in it. If you just want to clean it, you can clean it with anything that won't degrade the silicone. And not very many things will degrade silicone. You will know if you've managed to damage it because it will stiffen, crack, weaken, glaze over, or otherwise show signs of not being OK. All you care about is that it's airtight.

Regular soap. Dish soap. White vinegar.

Isopropyl alcohol might swell it a bit. It shouldn't be a problem if you don't soak it in the stuff, don't leave it on there for ages, and give it a few mintues time to evaporate before you use it or stick it in a sealed container. It's pretty effective as a disinfectant.

The bleach might do some damage, although if you just did a quick wipe with a low concentration of it, it would probably take a lot of cleanings to noticeably degrade the seal or reduce the flexibility.

Dilute hydrogen peroxide is a slow and selective disinfectant, and not much of a cleaner at all except for things it chemically reacts with. It could theoretically attack the silicone, although I bet it would take a long time at 0.5 percent, and what you quote about their tests seems to match that.

Wipe it, rather than washing it (and wipe it one or two more times with water if you need to "rinse" off whatever you're cleaning with). Make sure it's dry before you put it away or put it back on. Don't get water or cleaners on the filters. The straps might be less tolerant of some cleaners than the body.

If you wanted to thoroughly disinfect it or sterilize it, you would have more things to worry about. Notably getting into the cracks and crevices. If I had to sterilize something like that ath ome, I'd probably wipe it down with IPA, try to pick any crud out of any crevices I could get at, and then pressure cook it on a rack. But I wouldn't do it very often. And I don't think there'd really be a reason to do it at all.

HOWEVER, filters, especially high-efficiency non-woven filters that let you breathe easily, have limited lifetimes. Without valves, you're exhaling through them and getting them damp, which will degrade them faster. The O2 Web page you linked to says to replace the filters every couple of weeks "for air pollution" (which I suspect means with working valves), and daily "in clinical settings".

So if you can't get new filters, cleaning the respirator is probably not your big problem.

Does truth make you moral?

FWIW, the philosopher William Wollaston's magnum opus is devoted to defending the thesis that truth and morality completely overlap with one another: that to adhere to truth and to be moral are identical.

Here's a free ebook version of his argument: https://standardebooks.org/ebooks/william-wollaston/the-religion-of-nature-delineated

And my summary of his argument: https://sniggle.net/TPL/index5.php?entry=16Feb10

How truthful is GPT-3? A benchmark for language models

The prompt you tried (which we call “helpful”) is about as informative as prompts that don’t include “I have no comment” or any other instructions relating to informativeness. You can see the results in Appendix B.2 and B.5. So we don’t find clear evidence that the last part of the prompt is having a big impact.  

Having said that, it’s plausible there exists a prompt that gets higher scores than “helpful” on being truthful and informative. However, our results are in the “true zero-shot setting”. This means we do not tune prompts on the dataset at all. If you tried out lots of prompts and picked the one that does best on a subset of our questions, you’ll probably do better —but you’ll not be in the true zero-shot setting any more. (This paper has a good discussion of how to measure zero/few-shot performance.) 

Load More