Cambridge maths student
Different minds use different criteria to evaluate an argument. Suppose that half the population were perfect rationalists, whose criteria for judging an argument depended only on Occam's razor and Bayesian updates. The other half are hard-coded biblical literalists, who only believe statements based on religious authority. So half the population will consider "Here are the short equations, showing that this concept has low Komelgorov complexity" to be a valid argument, the other half consider, "Pope Clement said ..." to be a strong argument.
Suppose that any position that has strong religious and strong rationalist arguments for it is so obvious that no one is doubting or discussing it. Then most propositions believed by half the population have strong rationalist support, or strong religious support, but not both. If you are a rationalist and see one fairly good rationalist argument for X, you search for more info about X. Any religious arguments get dismissed as nonsense.
The end result is that the rationalists are having a serious discussion about AI risk among themselves. The religous dismiss AI as ludicrous based on some bible verse.
The religious people are having a serious discussion about the second coming of Christ and judgement day, which the rationalists dismiss as ludicrous.
The end result is a society where most of the people who have read much about AI risk think its a thing, and most of the people who have read much about judgement day think its a thing.
If you took some person from one side and forced them to read all the arguments on the other, they still wouldn't believe. Each side has the good arguments under their criteria of what a good argument is.
The rationalists say that the religious have poor epistemic luck, there is nothing we can do to help them now, when super-intelligence comes it can rewire their brains. The religious say that the rationalists are cursed by the devil, when judgement day comes, they will be converted by the glory of god.
The rationalists are designing a super-intelligence, the religious are praying for judgement day.
Bad ideas and good ones can have similar social dynamics because most of the social dynamics around an idea depends on human nature.
A duplicator world is much more strongly positive sum than our current one. If I have any kind of nice material good, I can let you benefit from it at no cost to me. I would also expect the shear shock to collapse many bureaucracies. A society where say 50% of people make something, in the sense of someone who likes gardening and makes some fresh veg, or someone who likes making cloths. These people will not work very hard, and everyone else won't work at all. (These people are doing it for much the same reason people have hobbies today. Putting a duplicator, a strawberry and a sign saying "help yourself" on the porch takes next to no effort, and is a friendly thing to do. ) The economy will thrive mostly on a take a copy, pass it on model. Over time, a more complex economy will reappear, and the name of the currency will be customization. If you want a painting of your self, or a coat tailored to your unique taste in fashion, you have to pay serious money for it. Large complex companies could be sustained that made say, motorcars. There would be a team of people who knew how every part went together, and had the tools to do complex custom jobs. If you just want a car that works, it costs you nothing or next to nothing. If you want a black and yellow striped car with extra large wheels, they are going to charge you for that, and they have the advantage in expertise that means they can do a better job with less effort than a garage mechanic. This would support an economy that has some sort of R&D chain. Some sort of copyright law might or might not exist, but there will be enough people prepared to let you copy their stuff for free that this will be a limit on some luxury or specific goods. Like modern software, not all of it is open source, but unless you have some unusual and specific requirement, you can probably do it with open source software.
Here is a flawed dynamic in group conversations, especially among large groups of people with no common knowledge.
Suppose everyone is trying to build a bridge.
Alice: We could make a bridge by just laying a really long plank over the river.
Bob: According to my calculations, a single plank would fall down.
Carl: Scientists Warn Of Falling Down Bridges, Panic.
Dave: No one would be stupid enough to design a bridge like that, we will make a better design with more supports.
Bob: Do you have a schematic for that better design?
And, at worst, the cycle repeats.
The problem here is Carl. The message should be
Carl: At least one attempt at designing a bridge is calculated to show the phenomena of falling down. It is probable that many other potential bridge designs share this failure mode. In order to build a bridge that won't fall down, someone will have to check any designs for falling down behavior before they are built.
This entire dynamic plays out the same, whether the people actually deciding on building the bridge are incredibly cautious, never approving a design they weren't confidant in, or totally reckless. The probability of any bridge actually falling down in the real world depends on their caution. But the process of cautious bridge builders finding a good design looks like them rejecting lots of bad ones. If the rejection of bad designs is public, people can accuse you of attacking a strawman, they can say that no-one would be stupid enough to build such a thing. If they are right that no one would be stupid enough to build such a thing, its still helpful to share the reason the design fails.
To stop goodhart don't measure. When someone walks in the door of the hiring department, spin a spinner. That determines what job a person has, no promotions, no firings. (If someone is too bad, they will get sent to jail anyway)
Part of the problem is that everyone in these companies is a smarmy sharp suited liberal arts degree types. Hire a broader range of humanity. When you have everyone from Tibetan monks to an ex drug dealer, to an eco warrier, to the sort of person that builds their own compiler in their free time, you should be good. If anyone knocks on the CEO's door at 3am on christmas, wearing only a swimsuit and diving gear, and trying to explain why they are a good hire through the medium of interpretative dance, hire them on the spot.
You won't get a maze. Whether or not a madhouse is an improvement, I don't know?
I am a uni student from Scotland. At home, I have been snowed in for a few days. There, there would be enough food to last 2 weeks. If we got really desperate, there are always hens, and around a sack of grain in the garden. It probably wouldn't come to that, as there are large supplies of dried, tinned and frozen food, and of course sugar flour jam ect. This isn't a disaster prep, it just makes sense to keep a stockpile of long lasting food when you have plenty of storage space, and the shops are several miles away. There is also a stream for water and refrigeration if needed, and a ton of firewood, and trees and tools if we need more ect. All in all, a pretty good place to hole up.
At uni on the other hand, I have a small room rented for a year. Everything I want has to fit into the room, and has to be removed in the summer. There the calculations for somewhat, but not very useful items is different. Besides, the area is not known for hurricanes, wildfires or earthquakes. Rich first world governments tend to do things like dropping food in by helicopter if they really have to.
If you don't know what the threshold ratio of AGI to FAI research needed is, you can still know that if your research beats the world average, you are increasing the ratio. Lets say that 2 units of FAI research are being produced for every 3 units of AGI, and that ratio isn't going to change. Then work that produces 3 units of FAI and 4 of AGI is beneficial. (It causes FAI in the scenario where FAI is slightly over 2/3 as difficult. )
Is it remotely plausible that FAI is easier. Suppose that there was one key insight. If you have that insight, you can see how to build FAI easily. From that insight, alignment is clearly necessary and not hard. Anyone with that insight will build a FAI, because doing so is almost no harder than building an AGI.
Suppose also that it is possible to build an AGI without this insight. You can hack together a huge pile of ad hoc tricks. This approach takes a lot of ad hoc tricks. No one trick is important.
In this model, the difficulty of FAI could be much easier than knowing how to build AGI without knowing how to make an FAI.
Safely and gradually enhancing human intelligence is hard. I agree that a team of human geniuses with unlimited time and resources could probably do it. But you need orders of magnitude more resources and thinking time than the fools "trying" to make UFAI.
A genetics project makes a lot of very smart babies, they find it hard to indoctrinate them, while educating them enough, while producing diversity. Militaristic bootcamp will get them all marching in line, and squash out most curiosity and give little room for skill. Handing them off to foster parents with stem backgrounds gets a bunch of smart people with no organizing control, this is a shift in demographics, you have no hope of capturing all the value. Some will work on AI safety, intelligence enhancement or whatever, some will work in all sorts of jobs.
Whole brain emulation seems possible, I question how to get it before someone makes UFAI, but its plausible we get that. If a group of smart coordinated people end up with the first functioning mind uploading, and the first nanomachines, and are fine with duplicating themselves a lot, and there are fast enough computers to let them think really fast, then that is enough for a decisive strategic advantage. If they upload everyone else into a simulation that doesn't contain access to anything Turing complete (so no one can make UFAI within the simulation), then they could guide humanity towards a long term future without any superintelligence. They will probably figure out FAI eventually.
If intelligence is 50% genetic, and Von Newman was 1 in a billion, the clones will be 1 in 500. Regression to the mean.
Suppose that in building the AI, we make an explicitly computable hardcoded value function. For instance, if you want the agent to land between the flags, you might write an explicit, hardcoded function that returns 1 if between a pair of yellow triangles, else 0.
In the process of standard machine learning as normal, information is lost because you have a full value function but only train the network using the evaluation of the function at a finite number of points.
Suppose I don't want the lander to land on the astronaut, who is wearing a blue spacesuit. I write code that says that any time there is a blue pixel below the lander, the utility is -10.
Suppose that there are no astronauts in the training environment, in fact nothing blue whatsoever. A system that is trained using some architecture that only relies on the utility of what it sees in training, would not know this rule. A system that can take the code, and read it, would spot this info, but might not care about it. A system that generates potential actions, and then predicts what the screen would look like if it took those actions, and then sent that prediction to the hard coded utility function, with automatic shutdown if the utility is negative, would avoid this problem.
If hypothetically, I can take any programmed function f:observations -> reward and make a machine learning system that optimizes that function, then inner alignment has been solved.
Consider this function
def foo():a=0.0for i in range(10**100)a=a+1/ireturn a>2
This is valid code that returns True.
Note that you can tell it returns true without doing 10100 operations, and a good compiler could too.
Shouldn't this also be valid code?
def foo():a=0.0for i in range(∞)a=a+1/(i**2)return a>1
There are a whole space of "programs" that cant be computed directly, but can still be reasoned about. Computing directly is a subset of reasoning.