Sorted by New

Wiki Contributions


Formally proving that some X you could realistically build has property Y is way harder than building an X with property Y. I know of no exceptions (formal proof only applies to programs and other mathematical objects). Do you disagree?

I don't understand why you expect the existence of a "formal math bot" to lead to anything particularly dangerous, other than by being another advance in AI capabilities which goes along other advances (which is fair I guess).

Human-long chains of reasoning (as used for taking action in the real world) neither require nor imply the ability to write formal proofs. Formal proofs are about math and making use of math in the real world requires modelling, which is crucial, hard and usually very informal. You make assumptions that are obviously wrong, derive something from these assumptions, and make an educated guess that the conclusions still won't be too far from the truth in the ways you care about. In the real world, this only works when your chain of reasoning is fairly short (human-length), just as arbitrarily complex and long-term planning doesn't work, while math uses very long chains of reasoning. The only practically relevant application so-far seems cryptography because computers are extremely reliable and thus modeling is comparatively easy. However, plausibly it's still easier to break some encryption scheme than to formally prove that your practically relevant algorithm could break it.

LLMs that can do formal proof would greatly improve cybersecurity across the board (good for delaying some scenarios of AI takeover!). I don't think they would advance AI capabilities beyond the technological advances used to build them and increasing AI hype. However, I also don't expect to see useful formal proofs about useful LLMs in my lifetime (you could call this "formal interpretability"? We would first get "informal interpretability" that says useful things about useful models.) Maybe some other AI approach will be more interpretable.

Fundamentally, the objection stands that you can't prove anything about the real world without modeling, and modeling always yields a leaky abstraction. So we would have to figure out "assumptions that allow to prove that AI won't kill us all while being only slightly false and in the right ways". This doesn't really solve the "you only get one try problem". Maybe it could help a bit anyway?

I expect a first step might be an AI test lab with many layers of improving cybersecurity, ending at formally verified, air-gapped, no interaction to humans. However, it doesn't look like people are currently worried enough to bother building something like this. I also don't see such an "AI lab leak" as the main path towards AI takeover. Rather, I expect we will deploy the systems ourselves and on purpose, finding us at the mercy of competing intelligences that operate at faster timescales than us, and losing control.

I think it makes a huge difference that most cybersecurity desasters only cost money (or cause damage to a company's reputation and loss of confidential information of customers) while a biosecurity desaster can kill a lot of people. This post seems to ignore this?

Besides thinking it fascinating and perhaps groundbreaking, I don't really have original insights to offer. The most interesting democracies on the planet in my opinion are Switzerland and Taiwan. Switzerland shows what a long and sustained cultural development can do. Taiwan shows the potential for reform from within and innovation.

There's a lot of material to read, in particular the events after the sunflower movement in Taiwan. Keeping links within lesswrong: and

What's missing in this discussion is why one is talking to the "bad faith" actor in the first place.

If you're trying to get some information and the "bad faith" actor is trying to deceive you, you walk away. That is, unless you're sure that you're much smarter or have some other information advantage that allows you to get new useful information regardless. The latter case is extremely rare.

If you're trying to convince the "bad faith" actor, you either walk away or transform the discussion into a negotiation (it arguably was a negotiation in the first place). The post is relevant for this case. In such situations, people often pretend to be having an object level discussion although all parties know it's a negotiation. This is interesting.

Even more interesting, Politics: you're trying to convince an amateur audience that you're right and someone else is wrong. The other party will almost always act "in bad faith" because otherwise the discussion would be taking place without an audience. You can walk away while accusing the other party of bad faith but the audience can't really tell if you were "just about to loose the argument" or if you were arguing "less in bad faith than the other party", perhaps because the other party is losing the argument. Crucially, given that both parties are compelled to argue in bad faith, the audience is to some extent justified in not being moved by any object level arguments since they mostly cannot check if they're valid. They keep to the opinions they have been holding and the opinions of people they trust.

In this case, it might be worth it to move from the above situation, where the object level being discussed isn't the real object-level issue, as in the bird example, to one where a negotiation is taking place that is transparent to the audience. However, this is only possible if there is a competent fourth party arbitrating, as the competing parties really cannot give up the advantage of "bad faith". That's quite rare.

An upside: If the audience is actually interested in the truth, however, and if it can overcome the tribal drive to flock to "their side", they can maybe force the arguing parties to focus on the real issue and make object-level arguments in such a way that the audience can become competent enough to judge the arguments.Doing this is a huge investment of time and resources. It may be helped by all parties acknowledging the "bad faith" aspect of the situation and enforcing social norms that address it. This is what "debate culture" is supposed to do but as far as I know never really has.

My takeaway: don't be too proud of your debate culture where everyone is "arguing in good faith", if it's just about learning about the word. This is great, of course, but doesn't really solve the important problems.

Instead, try to come up with a debate culture (debate systems?) that can actually transform a besides-the-point bad-faith apparent disagreement into a negotiation where the parties involved can afford to make their true positions explicitly known. This is very hard but we shouldn't give up. For example, some of the software used to modernize democracy in Taiwan seems like an interesting direction to explore.

I think any outreach must start with understanding where the audience is coming from. The people most likely to make the considerable investment of "doing outreach" are in danger of being too convinced of their position and thinking it obvious; "how can people not see this?".

If you want to have a meaningful conversation with someone and interest them in a topic, you need to listen to their perspective, even if it sounds completely false and missing the point, and be able to empathize without getting frustrated. For most people to listen and consider any object level arguments about a topic they don't care about, there must first be a relationship of mutual respect, trust and understanding. Getting people to consider some new ideas, rather than convincing them of some cause, is already a very worthy achievement.

Indeed, systems controlling the domestic narrative may become sophisticated enough that censorship plays no big role. No regime is more powerful and enduring than one which really knows what poses a danger to it and what doesn't, one which can afford to use violence, coercion and censorship in the most targeted and efficient way. What a small elite used to do to a large society becomes something that the society does to itself. However, this is hard and I assume will remain out of reach for some time. We'll see what develops faster: sophistication of societal control and the systems through which it is achieved, or technology for censorship and surveillance. I'd expect at least a "transition period" of censorship technology spreading around the world as all societies that successfully use it become sophisticated enough to no longer really need it.

What seems more certain is that AI will be very useful for influencing societies in other countries, where the sophisticated domestically optimal means aren't possible to deploy. This goes very well with exporting such technology.

Uncharitably, "Trust the Science" is a talking point in debates that have some component which one portrays as "fact-based" and which one wants to make an "argument" about based on the authority of some "experts". In this context, "trust the science" means "believe what I say".

Charitably, it means trusting that thinking honestly about some topic, seeking truth and making careful observations and measurements actually leads to knowledge, that knowledge is inteligibly attainable. This isn't obvious, which is why there's something there to be trusted. It means trusting that the knowledge gained this way can be useful, that it's worth at least hearing out people who seem or claim to have it, that it's worth stopping for a moment to honestly question one's own motivations and priors, the origins of one's beliefs and to ponder the possible consequences in case of failure, whenever one willingly disbelieves or dismisses that knowledge. In this context, "trust the science" means "talk to me and we'll figure it out".

  1. There's a big difference between philosophy and thinking about unlikely scenarios in the future that are very different from our world. In fact, those two things have little overlap. Although it's not always clear, (I think) this discussion isn't about aesthetics, or about philosophy, it's about scenarios that are fairly simple to judge but have so many possible variations, and are so difficult to predict, that is seems pointless to even try. This feeling of futility is the parallel with philosophy, much of which just digests and distills questions into more questions, never giving an answer, until a question is no longer philosophy and can be answered by someone else.

The discussion is about whether or not human civilization will distroy itself due to negligence and lack of ability to cooperate. This risk may be real or imagined. You may care about future humans or not. But that doesn't make this neither philosophy nor aesthetics. The questions are very concrete, not general, and they're fairly objective (people agree a lot more on whether civilization is good than on what beauty is).

  1. I really don't know what you're saying. To attack an obvious straw man and thus give you at least some starting point for explaining further: Generally, I'd be extremely sceptical of any claim about some tiny coherent group of people understanding something important better than 99% of humans on earth. To put it polemically, for most such claims, either it's not really important (maybe we don't really know if it is?), it won't stay that way for long, or you're advertising for a cult. The phrase "truly awakened" doesn't bode well here... Feel free to explain what you actually meant rather than responding to this.

  2. Assuming these "ideologies" you speak of really exist in a coherent fashion, I'd try to summarize "Accelerationist ideology" as saying: "technological advancement (including AI) will accelerate a lot, change the world in unimaginable ways and be great, let's do that as quickly as possible", while "AI safety (LW version)" as saying "it might go wrong and be catastrophic/unrecoverable; let's be very careful". If anything, these ideas as ideologies are yet to get out into the world and might never have any meaningful impact at all. They might not even work on their own as ideologies (maybe we mean different things by that word).

So why are the origins interesting? What do you hope to learn from them? What does it matter if one of those is an "outgrowth" of one thing more than some other? It's very hard for me to evaluate something like how "shallow" they are. It's not like there's some single manifesto or something. I don't see how that's a fruitful direction to think about.

No offense, this reads to me as if it was deliberately obfuscated or AI-generated (I'm sure you didn't do either of these, this is a comment on writing style). I don't understand what you're saying. Is it "LW should focus on topics that academia neglects"?

I also didn't understand at all what the part starting with "social justice" is meant to tell me or has to do with the topic.

There has been some talk recently about long "filler-like" input (e.g. "a a a a a [...]") somewhat derailing GPT3&4, e.g. leading them to output what seems like random parts of it's training data. Maybe this effect is worth mentioning and thinking about when trying to use filler input for other purposes.

Load More