FactorialCode

Comments

Most Prisoner's Dilemmas are Stag Hunts; Most Stag Hunts are Battle of the Sexes

I'm OOTL, can someone send me a couple links that explain the game theory that's being referenced when talking about a "battle of the sexes"? I have a vague intuition from the name alone, but I feel this is referencing a post I haven't read.

Edit: https://en.wikipedia.org/wiki/Battle_of_the_sexes_(game_theory)

How much can surgical masks help with wildfire smoke?

I'm gonna go with barely, if at all. When you wear a surgical mask and you breath in, a lot of air flows in from the edges, without actually passing through the mask, so the mask doesn't have very good opportunity to filter the air. At least with N95 and N99 mask, you have a seal around your face, and this forces the air through the filter. Your probably better off wearing a wet bandana or towel that's been tied in such a way as to seal around your face, but that might make it hard to breath.

I found this, which suggests that they're generally ineffective. https://www.cdph.ca.gov/Programs/EPO/Pages/Wildfire Pages/N95-Respirators-FAQs.aspx

Money creation and debt

Yeah, I'll second the caution to draw any conclusions from this. Especially because this is macroeconomics.

Money creation and debt

https://en.wikipedia.org/wiki/Sectoral_balances

It is my understanding that this is broadly correct. It is also my understanding that this is not common knowledge.

Generalizing the Power-Seeking Theorems

One hypothesis I have is that even in the situation where there is no goal distribution and the agent has a single goal, subjective uncertainty makes powerful states instrumentally convergent. The motivating real world analogy being that you are better able to deal with unforeseen circumstances when you have more money.

Open & Welcome Thread - July 2020

I've gone through a similar phase. In my experience you eventually come to terms with those risks and they stop bothering you. That being said, mitigating x and s-risks has become one of my top priorities. I now spend a great deal of my own time and resources on the task.

I also found learning to meditate helps with general anxiety and accelerates the process of coming to terms with the possibility of terrible outcomes.

Alignment As A Bottleneck To Usefulness Of GPT-3

The way I was envisioning it is that if you had some easily identifiable concept in one model, e.g. a latent dimension/feature that corresponds to the log odd of something being in a picture, you would train the model to match the behaviour of that feature when given data from the original generative model. Theoretically any loss function will do as long as the optimum corresponds to the situation where your "classifier" behaves exactly like the original feature in the old model when both of them are looking at the same data.

In practice though, we're compute bound and nothing is perfect and so you need to answer other questions to determine the objective. Most of them will be related to why you need to be able to point at the original concept of interest in the first place. The acceptability of misclassifying any given input or world-state as being or not being an example of the category of interest is going to depend heavily on things like the cost of false positives/negatives and exactly which situations get misclassified by the model.

The thing about it working or not working is a good point though, and how to know that we've successfully mapped a concept would require a degree of testing, and possibly human judgement. You could do this by looking for situations where the new and old concepts don't line up, and seeing what inputs/world states those correspond to, possibly interpreted through the old model with more human understandable concepts.

I will admit upon further reflection that the process I'm describing is hacky, but I'm relatively confident that the general idea would be a good approach to cross-model ontology identification.

Alignment As A Bottleneck To Usefulness Of GPT-3

I think you can loosen (b) quite a bit if you task a separate model with "delineating" the concept in the new network. The procedure does effectively give you access to infinite data, so the boundary for the old concept in the new model can be as complicated as your compute budget allows. Up to and including identifying high level concepts in low level physics simulations.

Alignment As A Bottleneck To Usefulness Of GPT-3

I think the eventual solution here (and a major technical problem of alignment) is to take an internal notion learned by one model (i.e. found via introspection tools), back out a universal representation of the real-world pattern it represents, then match that real-world pattern against the internals of a different model in order to find the "corresponding" internal notion.

Can't you just run the model in a generative mode associated with that internal notion, then feed that output as a set of observations into your new model and see what lights up in it's mind? This should work as long as both models predict the same input modality. I could see this working pretty well for matching up concepts between the latent spaces of different VAEs. Doing this might be a bit less obvious in the case of autoregressive models, but certainly not impossible.

$1000 bounty for OpenAI to show whether GPT3 was "deliberately" pretending to be stupider than it is

I think this is pretty straight forward to test. GPT-3 gives joint probabilities of string continuations given context strings.

Step 1: Give it 2 promps, one suggesting that it is playing the role of a smart person, and one where it is playing the roll of a dumb person.

Step 2: Ask the "person" a question that demonstrates that persons intelligence. (something like a math problem or otherwise)

Step 2: Write continuations where the person answers correctly and incorrectly

Step 3: Compare the relative probabilities GPT-3 assigns to each continuation given the promps and questions.

If GPT-3 is sandbagging itself, it will assign a notably higher probability to the correct answer when conditioned on the smart person prompt than when conditioned on the dumb person prompt. If it's not, it will give similar probabilities in both cases.

Step 4: Repeat the experiment with problems of increasing difficulty and plot the relative probability gap. This will show the limits of GPT-3's reflexive intelligence. (I say reflexive because it can be instructed to solve problems it otherwise couldn't with the amount of serial computations at it's disposal by carrying out an algorithm as part of its output, as is the case with parity)

This is an easy $1000 for anyone who has access to the beta API.

Load More