Suppose we considered simulating some human for a while to get a single response. My math heuristics are throwing up the hypothesis that proving what the response would be is morally equivalent to actually running the simulation - it's just another substrate. Thoughts? Implications? References?
I've found a category-theoretical model of BCI-powered reddit!
Fix a set of posts. Its subsets form a category whose morphisms are inclusions that map every element to itself. Call its forgetful functor to Set f. Each BCI can measure its user, such as by producing a vector of neuron activations. Its possible measurements form a space, and these spaces form a category. (Its morphisms would translate between brains, and each morphism would keep track of how well it preserves meaning.) Call its forgetful functor to Set g.
The comma category f/g has as its objects users (each a Set-function from some set of posts they've seen to their measured reactions), and each morphism would relate the user to another brain that saw more posts and reacted similarly on what the first user saw.
The product on f/g tells you how to translate between a set of brains. A user could telepathically tell another what headspace they're in, so long as the other has ever demonstrated a corresponding experience. Note that a republican sending his love for republican posts might lead to a democrat receiving his hatred for republican posts.
The coproduct on f/g tells you how to extrapolate expected reactions between a set of brains. A user could simply put himself into a headspace and get handed a list of posts he hasn't seen for which it is expected that they would have put him into that headspace.
Hearthstone has recently released Zephrys the Great, a card that looks at the public gamestate and gives you a choice between three cards that would be useful right now. You can see it in action here. I am impressed in the diversity of the choices it gives. An advisor AI that seems friendlier than Amazon's/Youtube's recommendation algorithm, because its secondary optimization incentive is fun, not money!
Could we get them to opensource the implementation so people could try writing different advisor AIs to use in the card's place for, say, their weekly rule-changing Tavern Brawls?
OpenAI has a 100x profit cap for investors. Could another form of investment restriction reduce AI race incentives?
The market selects for people that are good at maximizing money, and care to do so. I'd expect there are some rich people who care little whether they go bankrupt or the world is destroyed.
Such a person might expect that if OpenAI launches their latest AI draft, either the world is turned into paperclips or all investors get the maximum payoff. So he might invest all his money in OpenAI and pressure OpenAI (via shareholder swing voting or less regulated means) to launch it.
If OpenAI said that anyone can only invest up to a certain percentage of their net worth in OpenAI, such a person would be forced to retain something to protect.
The wavefunctioncollapse algorithm measures whichever tile currently has the lowest entropy. GPT-3 always just measures the next token. Of course in prose those are usually the same, but I expect some qualitative improvements once we get structured data with holes such that any might have low entropy, a transformer trained to fill holes, and the resulting ability to pick which hole to fill next.
Until then, I expect those prompts/GPT protocols to perform well which happen to present the holes in your data in the order that wfc would have picked, ie ask it to show its work, don't ask it to write the bottom line of its reasoning process first.
Long shortform short: Include the sequences in your prompt as instructions :)
All the knowledge a language model character demonstrates is contained in the model. I expect that there is also some manner of intelligence, perhaps pattern-matching ability, such that the model cannot write a smarter character than itself. The better we engineer our prompts, the smaller the model overhang. The larger the overhang, the more opportunity for inner alignment failure.
I expect that all that's required for a Singularity is to wait a few years for the sort of language model that can replicate a human's thoughts faithfully, then make it generate a thousand year's worth of that researcher's internal monologue, perhaps with access to the internet.
Neural networks should be good at this task - we have direct evidence that neural networks can run human brains.
Whether our world's plot has a happy ending then merely depends on the details of that prompt/protocol - such as whether it decides to solve alignment before running a successor. Though it's probably simple to check alignment of the character - we have access to his thoughts. A harder question is whether the first LM able to run humans is still inner aligned.
https://arbital.com/p/cev/ : "If any hypothetical extrapolated person worries about being checked, delete that concern and extrapolate them as though they didn't have it. This is necessary to prevent the check itself from having a UDT influence on the extrapolation and the actual future."
Our altruism (and many other emotions) are evolutionarily just an acausal reaction to the worry that we're being simulated by other humans.
It seems like a jerk move to punish someone for being self-aware enough to replace their emotions by the decision-theoretic considerations they evolved to approximate.
And unnecessary! For if they behave nicely when checked because they worry they're being checked, they should also behave nicely when unchecked.
I think (given my extremely limited understanding of this stuff) this is to prevent UDT agents from fooling the people simulating them by recognizing that they're in a simulation.
IE, you want to ignore the following code:
Suppose all futures end in FAI or UFAI. Suppose there were a magic button that rules out the UFAI futures if FAI was likely enough, and the FAI futures otherwise. The cutoff happens to be chosen to conserve your subjective probability of FAI. I see the button as transforming our game for the world's fate from one of luck into one of skill. Would you press it?
Consider a singleton overseeing ten simpletons. Its ontology is that each particle has a position. Each prefers all their body's particles being in it to the alternative. It aggregates their preferences by letting each of them rule out 10% of the space of possibilities. This does not let them gurantee their integrity. What if it considered changes to a single position instead of states? Each would rule out any change that removes a particle from their body, which fits fine in their 10%. Iterating non-ruled-out changes would end up in an optimal state starting from any state. This isn't free lunch, but we should formalize what we paid.