Quintin Pope


MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models"

GPT-3 (and most pretrained transformers) generate tokens, not words or characters. Sometimes, those tokens represent words and sometimes they represent single characters. More common words receive their own token, and less common words are broken into two or more tokens. The vocab is tuned to minimize avg. text length.

Simple Tricks to Improve KN95 Masks

Looking at amazon, kn95s cost ~$1 per mask. Replacement filters for the mask I listed cost $18 for the pair. If the mask filters last > 18 times longer than the KN95, the mask is cheaper.

There are two types of filters. Gas filters inactivate harmful gasses with chemicals in the filters. They need to be changed often because the filter runs out of chemicals. Particulate filters remove particulates from the air. They need to be changed once the filter clogs up with particulates. If you’re not using it to filter smoke/dust/ect, they can last a long time. I’d change at least once every 6 months, but that’s $36 per year in mask costs, which is pretty small.


As someone who exclusively uses respirators, I’d say they’re actually reasonably comfortable once you get used to them.

Simple Tricks to Improve KN95 Masks

Honestly, you’re probably better off just buying a reusable half-face respirator: https://www.grainger.com/product/MILLER-ELECTRIC-Half-Mask-Respirator-Kit-36RC58

They’re more comfortable that you’d expect, offer vastly superior protection, and are far more convenient than having to fiddle with different reusable masks as they fall apart. It’s also probably cheaper than constantly buying new reusable masks. The linked mask is $43.27 and (depending on location), should arrive in a few days.

A simple way to make GPT-3 follow instructions

I was using the API. The trick actually seemed to help a bit, but its responses were still inconsistent and not always “no”.

A simple way to make GPT-3 follow instructions

I’m not sure I follow. Where does AI Dungeon come into this? Could you elaborate?

The GPT-3 answers I gave for “are ghosts real?” came from zero-shot prompting of OpenAI’s GPT-3 API (meaning no prior examples).

If you’re asking about the “You are a superintelligent AI that’s never wrong” trick, then the idea is that, by prefacing your question like this, you can get GPT-3 to write the text that the “superintelligent AI” would write, because GPT-3 thinks that’s the most likely continuation. GPT-3 is more likely to be right if it’s writing dialog for a character that’s always right.

People often give GPT-3 multiple examples of the task they want it to solve (multi-shot prompting), but wanted to keep things simple in the post. I’ll add some clarification there.

The finetuning scheme I proposed probably wouldn’t be as beneficial in the multi-shot setting as it would be in the zero-shot setting, but I still think it would be beneficial. Explicitly training GPT-3 to follow instructions also seems like a more straightforward way to tell GPT-3 that it’s supposed to follow your instructions than giving it enough examples that GPT-3 picks up on your intent. Working with GPT-3 would be far easier if we didn’t have to generate a list of examples of each task we wanted it to do.

The case for aligning narrowly superhuman models

The control codes could include a special token/sequence that only authorized users can use.

Also, if you’re allowing arbitrary untrusted queries to the model, your security shouldn’t depend on model output anyways. Even if attackers can’t use control codes, they can still likely get the model to do what they want via blackbox adversarial search over the input tokens.

The case for aligning narrowly superhuman models

Suppose we want to train GPT-n in to do any of many different goals (give good medical advice, correctly critique an argument, write formal and polite text, etc). We could find training data that demonstrate a possible goal and insert natural language control codes around that data.

E.g., suppose XY is a section of training text. X contains a description of a medical problem. Y gives good medical advice. We would then modify XY to be something like:

[give correct medical advice]X[start]Y[end]

We would then repeat this for as many different goals and for as much of the training text as possible. Hopefully, GPT-n will learn that [instructions](problem description)[start] should be followed by the solution to (problem description) in accordance with [instructions], and that it should only revert to “normal text” mode once it sees an [end].

If GPT-n generalizes well, we may be able to provide customized control codes that don’t appear anywhere in the training data and have it follow our instructions. I think this approach will scale well because bigger models are better at learning rare patterns in their data. We just need to annotate enough examples to teach the intended pattern. This may even be easier for bigger/more sample efficient models.

(This is basically the approach described in https://arxiv.org/abs/1909.05858 but with more focus on generalizing control codes with natural language.)

GPT-3 Fiction Samples

Same. Specifically, I went from predicting 50% chance of human-level AGI within 40 years to 50% chance within 10 years.

Andrew Mayne was also given access to the GPT-3 API. You can read his impressions here: https://andrewmayneblog.wordpress.com/

I found his results very impressive as well. For example, he's able to prompt GPT-3 to summarize a Wikipedia article on quantum computing at either a second grade or an eighth grade level, depending on the prompt.

I actually put together a presentation on GPT-like architectures and their uses for my advisor: https://docs.google.com/presentation/d/1kCJ2PJ_3UteHBX5TWZyrF5ontEdNx_B4vi6KTmQmPNo/edit?usp=sharing

It's not really meant to be a stand alone explanation, but it does list some of GPT-2/3's more impressive abilities. After compiling the presentation, I think we'll look back on GPT-3 as the "Wright brothers" moment for AGI.

Consider, this post suggests GPT-3 cost ~$4.6 million to train: https://lambdalabs.com/blog/demystifying-gpt-3. It would be well within Google/Microsoft/Amazon/DoD/etc's budget to increase model size by another 2 (possibly 3) orders of magnitude. Based on the jump in GPT-3's performance going from 13 B parameters to 175 B parameters, such a "GPT-4" would be absolutely stunning.

How effective are tulpas?

I don't have a full tulpa, but I've been working on one intermittently for the past ~month. She can hold short conversations, but I'm hesitant to continue the process because I'm concerned that her personality won't sufficiently diverge from mine.

I think it's plausible that a tulpa could improve (at least some of) your mental capabilities. I draw a lot of my intuition in this area from a technique in AI/modeling called ensemble learning, in which you use the outputs of multiple models to make higher quality decisions than is possible with a single model. I know it's dangerous to draw conclusions about human intelligence from AI, but you can use ensemble learning with pretty much any set of models, so something similar is probably possible with the human brain.

Some approaches in ensemble learning (boosting and random forest) suggest that it's important for the individual models to vary significantly from each other (thus my interest in having a tulpa that's very different from me). One advantage of ensemble approaches is that they can better avoid over fitting to spurious correlations in their training data. I think that a lot of harmful human behavior is (very roughly) analogous to over fitting to unrepresentative experiences, e.g., many types of learned phobias. I know my partial tulpa is much less of a hypochondriac than myself, is less socially anxious and, when aware enough to do so, reminds me not to pick at my cuticles.

Posters on the tulpas subreddit seem split on whether a host's severe mental health issues (depression, autism, OCD, bipolar, etc) will affect their tulpas, with several anecdotes suggesting tulpas can have a positive impact. There's also this paper: Tulpas and Mental Health: A Study of Non-Traumagenic Plural Experiences, which finds tulpas may benefit the mentally ill. However, it's in a predatory journal (of the pay to publish variety). There appears to be an ongoing study by Stanford researchers looking into tulpas' effects on their hosts and potential fMRI correlates of tulpa related activity, so better data may arrive in the coming months.

In terms of practical benefit, I suspect that much of the gain comes from your tulpa pushing you towards healthier habits through direct encouragement and social/moral pressure (if you think your tulpa is a person who shares your body, that's another sentient who your own lack of exercise/healthy food/sleep is directly harming).

Additionally, tulpas may be a useful hedge against suicide. Most people (even most people with depression) are not suicidal most of the time. Even if the tulpa's emotional state correlates with the host's, the odds of both host and tulpa being suicidal at once are probably very low. Thus, a suicidal person with a tulpa will usually have someone to talk them out of acting.

Regarding performance degradation, my impression from reading the tulpa.info forums is that most people have tulpas that run in serial with their original minds (i.e., host runs for a time, tulpa runs for a time, then host), rather than in parallel. It's still possible that having a tulpa leads to degradation, but probably more in the way that constantly getting lost in thought might, as opposed to losing computational resources. In this regard, I suspect that tulpas are similar to hobbies. Their impact on your general performance depends on how you pursue them. If your tulpa encourages you to exercise, mental performance will probably go up. If your tulpa constantly distracts you, performance will probably go down.

I've been working on an aid to tulpa development inspired by the training objectives of state of the art AI language models such as BERT. It's a Google colab notebook, which you'll need a google account to run from your browser. It takes text from a number of possible books from Project Gutenberg and lets your tulpa perform several language/personality modeling tasks of varying complexity, ranging from simply predicting the content of masked words to generating complex emotional responses. Hopefully, it can help reduce the time required for tulpas to reach vocality and ease the cost of experimenting in this space.

Load More