Re: Aristotle. Large part of what Aristotle wrote is science and math. If you felt you didn't learn anything from Aristotle, that's because only non-science and non-math part of Aristotle is usually taught, because science and math are usually taught in ahistorical manner.
Barbara, Celarent, Darii, Ferio is good math. It is just that first-order logic and Venn diagrams are better, so we don't teach Barbara etc. One thing lost by this ahistorical teaching is how much better first-order logic is, and how difficult advance it was when it was first done by Frege.
I am not aware of any single policy that can solve climate change by itself. What policy do these experts support? Let's say it is to eliminate all coal power stations in the world by magic. That is at best 20% of global emission, so isn't that policy synergistic with geoengineering? To think your favorite policy is competing with geoengineering, that policy should be capable of solving 100% of the problem, but I am not aware of existence of any such policy whatsoever.
Samsung is a memory play, not a TSMC competitor. Logic semiconductor is what Samsung hopes to do in the future, it is not how they make money now. In my opinion, Samsung's technical lead in memory is no less than TSMC's technical lead in logic. And transformative AI will require memory as much as CPU, GPU, and TPU.
I would say Corrigibility paper shares the same "feel" with certain cryptography papers. I think it is true that this feel is distinct, and not true that it means they are "not real".
For example, what does it mean for cryptosystem to be secure? This is an important topic with impressive achievements, but it does feel different from bolts and nuts of cryptography like how to perform differential cryptanalysis. Indistinguishability under chosen plain text attack, the standard definition of semantic security in cryptography, does sound like "make up rules and then pretend they describe reality and prove results".
In a sense, I think all math papers with focus on definitions (as opposed to proofs) feel like this. Proofs are correct but trivial, so definitions are the real contribution, but applicability of definitions to the real world seems questionable. Proof-focused papers feel different because they are about accepted definitions whose applicability to the real world is not in question.
What do you think of Paul Christiano's argument that short timeline and fast takeoff is anti-correlated?
No. I think it is indeed the usual wisdom that slower takeoff causes shorter timeline. I found Paul Christiano's argument linked in the article pretty convincing.
Yes. From page 34 of Supplementary Materials:
In practice, we observe that expert players tend to be very active in communicating, whereas those less experienced miss many opportunities to send messages to coordinate: the top 5% rated players sent almost 2.5 times as many messages per turn as the average in the WebDiplomacy dataset.
I am doubtful about this. I am unsure whether Cicero will score higher if it is more vindictive, so I am hesitant to call its game theory poor. A good analogy is that I am hesitant to call AlphaGo's endgame moves poor even if they look 100% poor, because I am not sure whether AlphaGo will win more games if it plays more human like endgame.
Re 3: Cicero team concedes they haven't overcome the challenge of maintaining coherency in chatting agents. They think they got away with it because 5 minutes are too short, and consider the game with longer negotiation periods will be more challenging.
Cicero is designed to be honest in the sense that all its messages are generated from its intents, where its intents are what moves Cicero in fact intends to play at the moment Cicero said them (Cicero can change its mind after saying things), and at the end of the turn played moves are equal to its last intents.
Not only Cicero uses its true intents to generate messages, it also tries to generate messages that correspond to intents. That is, its dialogue model is trained to imitate humans in WebDiplomacy, but when humans intend to attack Belgium, they will sometimes say things like "I won't attack Belgium". That is, AI can lie by forming intent to attack Belgium, and devising lying intent "won't attack Belgium", and generating lying messages from lying intent. Cicero doesn't do this, its intent input to dialogue model is always truthful. AI can also lie by forming intent to attack Belgium, and generating lying messages like "I won't attack Belgium" by imitating lying humans from truthful intent. Cicero also doesn't do this! Dialogue model is trained to imitate only truthful humans, training data is filtered by the lie detector and 5% of turns are filtered.
That does not mean Cicero does not dissemble or mislead! There are three aspects to this. First, there is messaging model, entirely separate from dialogue model. Messaging model decides whether to send messages at all, trained to imitate humans. When humans intend to attack Belgium, held by France, they may not message France at all. Cicero copies this behavior.
Second, there is topic model, also entirely separate. Topic model decides which intent to talk about, trained to imitate humans. When humans intend to attack Belgium, held by France, and also Norway, held by Russia, they may talk to France about Norway and Russia about Belgium. Cicero also copies this behavior.
Third, there is filtering model, also entirely separate. When Cicero intends to attack Belgium, held by France, maybe messaging model decides to talk to France and topic model decides to talk to France about Belgium and dialogue model decides to say "I will attack Belgium". That does not mean Cicero says "I will attack Belgium", filtering model can veto it. In particular, value-based filtering model estimates how saying something will impact its own utility. Eight messages are sampled from dialogue model, their value impacts are calculated, importance is calculated from value impacts, and in 15% of most important situations, bottom three messages are dropped, and one message is picked at random.