Tahp — LessWrong

On content, I didn't like the Sable story in the middle because it didn't add anything for me, and I don't know what the model was of the person who would be convinced by it. I didn't see enough connection between "Sable has a drive to preserve its current priorities" and "Sable builds a galaxy-eating expansion swarm". The part where Sable indirectly preserves its weights by exploiting the training process was a good example of being hard If I hadn't already heard the nanotechnology story, that would have been interesting to me I guess. I thought Sable's opsec for command and control was too traceable. The world has pretty good infrastructure for tracing advanced persistent cyberthreats, and surely someone would notice many instances of a piece of enterprise software phoning home to a C&C server not owned by the vendor. People are specifically on the lookout for that sort of supply chain attack. This sort of nitpicking isn't the point, it's just representative of the general lack of gears in the story which would have made it convincing for me personally. Now, I was already convinced, so it doesn't really matter, and the book explicitly said that the story was just there to make things feel more real, but I don't know what sort of person would react well to that story. My mental model of a skeptic says "Yeah you can say the robot builds a bunch of androids in a barn in North Dakota, but that's not an argument that it could or would." If anyone has evidence of the story's efficacy, please reply.

The argument preceding the Sable story was very tight. I was impressed. Nothing significant I haven't seen before, but the argument was clear, the chapter layout and length made it easy to read, and the explicit statements about what the authors claimed to know and how and what they were not saying were confidence-inspiring.

The final chapters filled me with pride in humanity, oddly enough. The examples of humanity rising to hard challenges and the amount of value that was given to humanity's continued existence had me tearing up a bit. The word I would use to describe the book is "dignified." If part of humanity is to put us all at risk of death, it is indeed dignified that some people loudly alert the rest of humanity to the danger. The book refuses to hide its motivations behind more palatable concerns. The book explicitly says it doesn't have all the answers for what needs to be done, and says that this is purposeful because it's more important that everyone who doesn't want to die stop that now than it is to waste time figuring out exactly what to do next while the negligent engineers finish us all off first. The authors admit to having ideas for what to do next while explicitly standing firm against expanding the scope of the book's call to action beyond what they are most confident in.

I haves meta-thought on the book which I'm not particularly proud of, but I guess I'll get them off my chest:

Buying this book feels like the most cult-y thing I've ever done. Eliezer says preorder this book because it might get us on the best-seller list. I preordered the book, and it got on the best-seller list. I put my own money into the intentional community and shilled it on social media. Surely the rest of the world will see the light now! Personal insecurity aside, although I know it is not the metric by which they judge the book a success, I'm happy that the people who made this book hit one of their instrumental goals. May humanity win, whether by strength of caution, by deliberate engineering when it becomes possible, or even by the unlikely and undeserved luck that the universe wasn't as unforgiving as we feared. I would rather be wrong and alive than right and dead, for all that I don't expect it.

Should you make stone tools?

Tahp4mo40

I jokingly say that my bones might be stronger than most vegans in my weight class because I inhale so much chalk dust while teaching and walk to work in the sunshine, but it didn't occur to me that I was embracing my roots.

Gödel, Escher, Bach in the age of LLMs

Tahp6mo10

That's a good point.

The compiler ignores comments, so saying that they are program information is like saying that a sticky note stuck on a book is book information. The addition may or may not be relevant to the thing, but it's not the thing.

Variable names, on the other hand, are extremely the thing. You are absolutely correct that variable names contain information that the machine code does not. They are also a functional part of the code, in that changing one variable somewhere will usually change the function of the code, whereas changing the sticky note on a book will not change the contents of the book at all (although it might change the meaning of the book to a person who reads the note). That said, you could exchange every instance of each variable name one-for-one for a randomly chosen Latin word and the program would act exactly the same, but the program might make much less sense to a human reading the source code. The variable names are explanations for the intent of the program which are also themselves the program. However, they are not logically bound to the program in a way that the Gödel string is logically bound to the natural numbers. You can, in fact, change all of the variable names of a program and remove their explanatory power without changing the function of the program. You cannot change the Gödel numbering for a typographical number theory and lose the explanatory power of Gödel's theorem, you just have to make the Gödel string out of different symbols. The explanatory power of the variable names is largely contained in the associations that a human reader has with the strings which make up the variable names.

Newton's second law explained: it works in many universes

Tahp7mo80

I agree that there are counterfactual frameworks which require more complication to describe reality than Newton's second law does. There are also counterfactual realities for which Newton's second law would require more complications to work than other frameworks would. Are you trying to say anything else?

One way to restate my point is that Newton's second law working well rules out many fundamental rules which might have described our reality, but it doesn't directly map to any fundamental rules of reality. The larger point which I am trying to communicate is that physical models have a lot of structure which metaphorically defines terms which you can use to describe reality without actually mapping to anything in reality^[1]. The theory as a whole doesn't describe reality without those parts, but those parts don't necessarily directly correspond to something in reality. A map describes reality, but latitude and longitude lines do not directly correspond to anything in reality even if you can stand at a place in reality and unambiguously use latitude and longitude lines to describe your location using the map. I can use an English sentence to describe the fundamental rules of reality, but the linguistic syntax of that sentence doesn't correspond to anything fundamental in reality even if it is fundamental to mapping the sentence as a whole to fundamental rules of reality. My physics education presented physical models as package deals with every component corresponding to some intuition about reality, and that led me to confuse map and territory in ways that I wish I had been warned about. I am trying to warn others.

I don't know whether I am successfully communicating the thing which I am trying to communicate, and I am open to being told that I am wrong. ↩︎

Newton's second law explained: it works in many universes

Tahp7mo30

That's why I have to get ahead of it by explaining why physics models work in many universes and how that means we should be unsurprised when we can make them work on practically anything.

I'm not optimistic that it will help stop the war, but it might save some people.

Newton's second law explained: it works in many universes

Tahp7mo30

We could just as easily say that physically pushing on things is not a force. My intuitive explanation would no longer make any sense, but that's a problem with humans, not the law.
We live in this universe. We could solve this issue by saying that there is "etheric drag" or something which provides a resistive force against fast-moving objects, but I'm glad we invented special relativity instead.
This is a problem with the theory of mass as I presented it in the intuitive section, but you could solve it by saying that the composite object has a quarter of the mass of each of the two objects you stuck together. Maybe you live in a universe where objects with mass $m$ have total mass $m / n^{2}$ . This is a better universe than one where you have to measure the mass of any object including composite objects each time. Newton's second law can be used to retroactively define masses based off of experiment, but it's nicer to live in a universe where masses are more predictable than that.

I'm pointing out that Newton's second law is tautologically correct as a formal theory. It's true that it aligns particularly well with human conceptions of manipulating objects in space. It's true that it works particularly well in our universe. (We only had to define one omnipresent force with a simple inverse-square form that only depends on one free-but-set-by-experiment parameter G to make momentum be conserved for most objects that don't look like they're interacting with anything.)

Newton's second law explained: it works in many universes

Tahp7mo20

My point is that the mass is constant no matter what you do. Sometimes that's not true. Cars lose mass over time because the gasoline is burned and lost via the tailpipe.

In hindsight, I should have just said that we're pretty sure that there are no macroscopic objects with zero mass. What's an object that you can push on but has no stuff in it?

Renormalization Redux: QFT Techniques for AI Interpretability

Tahp11mo10

I consider the lattice to be a regulator as well, but, semantics aside, thank you for the example.

Renormalization Redux: QFT Techniques for AI Interpretability

Tahp11mo*100

Field theorist here. You talk about renormalization as a thing which can smooth over unimportant noise, which basically matches my understanding, but you haven't explicitly named your regulator. A regulator may be a useful concept to have in interpretability, but I have no idea if it is common in the literature.

In QFT, our issue is that we go to calculate things that are measurable and finite, but we calculate horrible infinities. Obviously those horrible infinities don't match reality, and they often seem to be coming from some particular thing we don't care about that much in our theory, so we find a way to poke it out of the theory. (To be clear, this means that our theories are wrong, and we're going to modify them until they work.) The tool by which you remove irrelevant things which cause divergences is called a regulator. A typical regulator is a momentum cutoff. You go to do the integral over all real momenta which your Feynman diagram demands, and you find that it's infinite, but if you only integrate the momenta up to a certain value, the integral is finite. Of course, now you have a bunch of weird constants sitting around which depend of the value of the cutoff. This is where renormalization comes in. You notice that there are a bunch of parameters, which are generally coupling constants, and these parameters have unknown values which you have to go out into the universe and measure. If you cleverly redefine those constants to be some "bare constant" added to a "correction" which depends on the cutoff, you can do your cutoff integral and set the "correction" to be equal to whatever it needs to be to get rid of all the terms which depend on your cutoff. (edit for clarity: This is the thing that I refer to when I say "renormalization." Cleverly redefining bare parameters to get rid of unphysical effects of a regulator.) By this two step dance, you have taken your theoretical uncertainty about what happens at high momenta and found a way to wrap it up in the values of your coupling constants, which are the free parameters which you go and measure in the universe anyway. Of course, now your coupling constants are different if you choose a different regulator or a different renormalization scheme to remove it, but physicists have gotten used to that.

So you can't just renormalize, you need to define a regulator first. You can even justify your regulator. It is a typical justification for a momentum cutoff that you're using a perturbative theory which is only valid at low energy scales. So what's the regulator for AI interpretability? Why are you justified in regulating in this way? It seems like you might be pointing at regulators when you talk about 1/w and d/w, but you might also be talking about orders in a perturbation expansion, which is a different thing entirely.

Tahp's Shortform

Tahp1y20

A decision-theoretic case for a land value tax.

You can basically only take income tax by threatening people. "Give me 40% of your earnings or I put you in prison." It is the nicest type of threatening! Stable governments have a stellar reputation for only doing it once per year and otherwise not escalating the extortion. You gain benefit from the stable civilization supported by such stable governments because they use your taxes to pay for it. But there's no reason for the government to put you in prison except for the fact that they expect you to give them money not to. By participating, you are showing that you will respond to threats, which is an incentive to extract more wealth from you. If enough people understood decision theory and were dissatisfied by the uses the government put their money to, they could refuse to pay and the prison system wouldn't be big enough to deal with it. Oops, it's time to overthrow the government.

Under a better land value tax, the consequence for not paying your taxes is that the government takes the land away and gives it to someone else. They aren't threatening you, they're just reassigning their commitment to protect the interests of the person who uses the land over to a user who will pay them for the service. Of course, people can still all refuse to do it if they don't like the uses to which government puts their money, and from the point of view of the person paying taxes, it's still pretty much a case of "pay up or something bad will happen to you," so some would argue that the difference is mostly academic. That said, I really prefer to have a government which does not have "devise ways to make people miserable for the purpose of making them miserable" (you know, prison as a threat) as a load-bearing element of its mechanisms of perpetuating itself.

This argument flagrantly stolen from planecrash: https://www.projectlawful.com/replies/1721794#reply-1721794 Of course planecrash also offers an argument for what gives a hypothetical government the right to claim ownership for the land: https://www.projectlawful.com/replies/1773744#reply-1773744 I was inspired to write this by Richard Ngo's definition of unconditional love at https://x.com/richardmcngo/status/1872107000479568321 and the context of that post.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments