Somewhat related; it seems likely that Bing's chatbot is not running on GPT-3 like ChatGPT was, but is running on GPT-4. This could explain its more defensive and consistent personality; it's smarter and has more of a sense of self than ChatGPT ever did.
I don't think that "users active on the site on Petrov day", nor "users who visited the homepage on Petrov day" are good metrics; someone who didn't want to press the button would have no reason to visit the site, and they might have not done so either naturally (because they don't check LW daily) or artificially (because they didn't want to be tempted or didn't want to engage with the exercise.) I expect there are a lot of users who simply don't care about Petrov day, and I think they should still be included in the set of "people who chose not to press t...
Something like that would be much more representative of real defection risks. It's easy to cooperate with people we like; the hard part is cooperating with the outgroup.
(Good luck getting /r/sneerclub to agree to this though, since that itself would require cooperation.)
It's difficult to incentivize people to not press the button, but here's an attempt: If we successfully get through Petrov day without anyone pressing the button (other than the person who has already done so via the bug), I will donate $50 to a charity selected by majority vote.
These are much more creative than mine, good job. I especially liked 8, 12, 27, and 29.
fast plane and steer up
rocket ship
throw it really hard
extremely light balloon
wait for an upwards gust of wind
tall skyscraper
space elevator
earthquake energy storage
really big tsunami
asteroid impact launch
wait for the sun to engulf both
increase mass of earth enough to make moon crash
elevator pulley system with counterweight
superman
rename earth to "the moon"
take it to a moon replica on earth
touch it to a moon rock on earth
really big air rifle
wait for tectonic drift to make a big enough mountain
teleporter
point a particle accelerator upwards
attach to passing ne
There's an experiment — insert obligatory replication crisis disclaimer — where one participant is told to gently poke another participant. The second participant is told to poke the first participant the same amount the first person poked them.
It turns out people tend to poke back slightly harder than they were first poked.
Repeat.
A few iterations later, they are striking each other really hard.
Do you know where I could read this study? I was unable to find it online with keywords like "poking", "escalation", etc.
A cognitive system with sufficiently high cognitive powers, given any medium-bandwidth channel of causal influence, will not find it difficult to bootstrap to overpowering capabilities independent of human infrastructure.
I don't find the argument you provide for this point at all compelling; your example mechanism relies entirely on human infrastructure! Stick an AGI with a visual and audio display in the middle of the wilderness with no humans around and I wouldn't expect it to be able to do anything meaningful with the animals that wander by before it breaks down. Let alone interstellar space.
Ah, so mortality almost always trends downwards except when it jumps species, at which point there can be a discontinuous jump upwards. That makes sense, thank you.
Why is it assumed that diseases evolve towards lower mortality? Every new disease is an evolved form of an old disease, so if that trend were true we'd expect no disease to ever have noticeable mortality.
Judging by a quick look at Twitter, this is going to be politically polarized right off the bat, with large swaths of the population immediately refusing vaccines or NPIs. So I think whether this turns into a serious pandemic is going to depend largely on the infectiousness of Monkeypox and not all that much else.
I don't think that's what's happening in the situations I'm thinking about, but I'm not sure. Do you have an example dialogue that demonstrates someone taking a belief literally when it obviously wasn't intended that that way?
Do you think that conveying my motivation for the question would significantly lower the frequency of miscommunications? If so, why?
I tend to avoid that kind of thing because I don't want it to bias the response. If I explain my motivations, then their response is more likely to be one that's trying to affect my behavior rather than convey the most accurate answer. I don't want to be manipulated in that way, so I try to ask question that people are more likely to answer literally.
From the "interpretation" section of the link I provided:
Truthfulness should be the absolute norm for those who trust in Christ. Our simple yes or no should be completely binding since deception is never an option for us. If an oath is required to convince someone of our honesty or intent to be faithful, it suggests we may not be known for telling the truth in other circumstances.
It's likely that the taking of oaths had become a way of manipulating people or allowing wiggle room to get out of some kinds of contracts. James is definite: For those in Christ, dishonesty is never an option.
I travel frequently for my job, and spend >50% of my time away from home. Can any of the existing cryonics organizations handle someone who has about an equal chance of dying in any of the ~200 largest cities in the US and Canada?
What's the conceptual difference between "running a search" and "applying a bunch of rules"? Whatever rules the cat AI is applying to the image must be implemented by some step-by-step algorithm, and it seems to me like that could probably be represented as running a search over some space. Similarly, you could abstract away the step-by-step understanding of how breadth-first search works and say that the maze AI is applying the rule of "return the shortest path to the red door".
How could an algorithm know Bob's hypothesis is more complex?
I think this is supposed to be Alice's hypothesis?
I'm having trouble understanding how the maze example is different from the cat example. The maze AI was trained on a set of mazes that had a red door along the shortest path, so it learned to go to those red doors. When it was deployed on a different set of mazes, the goal it had learned didn't match up with the goal its programmers wanted it to have. This seems like the same type of out-of-distribution behavior that you illustrated with the AI that learned to look for white animals rather than cats.
You presented the maze AI as different from the cat AI b...
it might contain over 101000000 candidates
This seems like an oddly specific number; is it supposed to be ?
If so, why is it such a small space? If the model accepts 24-bit, 1000x1000 pixel images and has to label them all as "cat" or "no cat", there should be possible models.
I don't know if this answers your question, but they have a technical guide here.
I didn't know this was a thing. Is there a post that explains why it isn't turned on by default? I looked around but couldn't find anything about agreement voting from less than 10 years ago, and none of those directly addressed that question anyway.
And are there any other types of voting that are turned off by default?
While friendly competition can be good in many contexts, I don't think this is one of them. The holiday is about a dedicated team who were willing to die together for their cause. I don't think competing to see who can go the longest without food would really be in the spirit of the holiday. I suspect it would also lead to bad feeling, having to police for cheating, etc.
The framing wasn't an intentional choice, I wasn't considering that aspect when I made the comment. I haven't been privy to any of the off-LW conflict about it, so it wasn't something that I was primed to look out for. I am not suggesting that there should be a community-wide standard (or that there shouldn't be). I intended it as "here's an idea that people may find interesting."
Thoughts on having part of the holiday be "have tasty food easily accessible (perhaps within sight range) during the fast"?
Pros:
This was probably meant sarcastically, but I do think that having part of the tradition be "have tasty food nearby during the fast" is worth consideration.
If the goal of rationalist holidays is to help us feel like a community, then this could make us feel more "special" and perhaps help towards that goal. (Many religions have holidays that call for a fast, but as far as I know none of them expect one to tempt themselves.)
It's also a nice display of self-control and the dangers of having instant gratification available. There's value in learning the ability to resist those urges for one's long-term benefit.
Well the biggest problem is that it doesn't seem to work. I tested in a 2-player game where we both locked in an answer, but the game didn't progress to the next round. I waited for the timer to run out, but it still didn't progress to the next round, just stayed at 0:00. Changes in my probability are also not visible to the other players until I lock mine in.
A few more minor issues:
Questions about a topic that I don't know about result in me just putting the max entropy distribution on that question, which is fine if it's rare, but leads to unhelpful results if they make up a large proportion of all the questions. Most calibration tests I found pulled from generic trivia categories such as sports, politics, celebrities, science, and geography. I didn't find many that were domain-specific, so that might be a good area to focus on.
Some of them don't tell me what the right answers are at the end, or even which questions I got wrong, whi...
This looks super neat, thank you for sharing. I just did a quick test and can confirm that it is in fact riddled with bugs. If it would help, I can write up a list of what needs fixing.
Wouldn't an observed mismatch between assigned probability and observed probability count as Bayesian evidence towards miscalibration?
I think you're confusing ignorance with other people's beliefs about that agent's ignorance. In your example of the police or the STD test, there is no benefit gained by that person being ignorant of the information. There is however a benefit of other people thinking the person was ignorant. If someone is able to find out whether they have an STD without anyone else knowing they've had that test, that's only a benefit for them. (Not including the internal cognitive burden of having to explicitly lie.)
An open-ended probability calibration test is something I've been planning to build. I'd be curious to hear your thoughts on how the specifics should be implemented. How should they grade their own test in a way that avoids bias and still gives useful results?
Whether Omega ended up being right or wrong is irrelevant to the problem, since the players only find out if it was right or wrong after all decisions have been made. It has no bearing on what decision is correct at the time; only our prior probability of whether Omega will be right or wrong matters.
I think you have to consider what winning means more carefully.
A rational agent doesn't buy a lottery ticket because it's a bad bet. If that ticket ends up winning, does that contradict the principle that "rational agents win"?
That doesn't seem at all analogous. At the time they had the opportunity to purchase the ticket, they had no way to know it was going to win.
...An Irene who acts like your model of Irene will win slightly more when omega makes an incorrect prediction (she wins the lottery), but will be given the million dollars far less commonly because
I think you're missing my point. After the $1,000,000 has been taken, Irene doesn't suddenly lose her free will. She's perfectly capable of taking the $1000; she's just decided not to.
You seem to think I'm making some claim like "one-boxing is irrational" or "Newcomb's problem is impossible", which is not at all what I'm doing. I'm trying to demonstrate that the idea of "rational agents just do what maximizes their utility and don't worry about having to have a consistent underlying decision theory" appears to result in a contradiction as soon as Irene's decision has been made.
Ah, that makes sense.
Some clarifications on my intentions writing this story.
Omega being dead and Irene having taken the money from one box before having the conversation with Rachel are both not relevant to the core problem. I included them as a literary flourish to push people's intuitions towards thinking that Irene should open the second box, similar to what Eliezer was doing here.
Omega was wrong in this scenario, which departs from the traditional Newcomb's problem. I could have written an ending where Rachel made the same arguments and Irene still decided against doing i...
I just did that to be consistent with the traditional formulation of Newcomb's problem, it's not relevant to the story. I needed some labels for the boxes, and "box A" and "box B" are not very descriptive and make it easy for the reader to forget which is which.
I don't find the simulation argument very compelling. I can conceive of many ways for Omega to arrive at a prediction with high probability of being correct that don't involve a full, particle-by-particle simulation of the actors.
In the case where you find yourself holding the $1,000,000 and the $1000 are still available, sure, you can pick them up. That only happens if either Omega failed to predict what you will do, or if you somehow set things up such that you couldn't, or had to pay a big price, to break your precommitment.
I don't think that's true. The traditional Newcomb's problem could use the exact setup that I used here, the only difference would be that either the opaque box is empty, or Irene never opens the transparent box. The idea that the $1000 is always "available" to the player is central to Newcomb's problem.
I don't find the simulation argument very compelling. I can conceive of many ways for Omega to arrive at a prediction with high probability of being correct that don't involve a full, particle-by-particle simulation of the actors.
making piece
should be
making peace
so it includes both asymptomatic cases
I think that "includes" should be "excludes"?
This is an interesting question, but I think your hypothesis is wrong.
Any pattern of physics that eventually exerts control over a region much larger than its initial configuration does so by means of perception, cognition, and action that are recognizably AI-like.
In order to not include things like an exploding supernova as "controlling a region much larger than its initial configuration" we would want to require that such patterns be capable of arranging matter and energy into an arbitrary but low-complexity shape, such as a giant smiley face in Life.
If ...
Ah, found the story. Wasn't quite as I remembered. (Search for "wrong number".)
https://arthurjensen.net/wp-content/uploads/2014/06/Speed-of-Information-Processing-in-a-Calculating-Prodigy-Shakuntala-Devi-1990-by-Arthur-Robert-Jensen.pdf