Nathaniel Monson

Mathematician turned alignment researcher. Probably happy to chat about math, current ML, or long-term AI thoughts.

The basics - Nathaniel Monson (nmonson1.github.io)

Posts

Sorted by New

Wiki Contributions

Comments

Minor nitpicks: -I read "1 angstrom of uncertainty in 1 atom" as the location is normally distributed with mean <center> and SD 1 angstrom, or as uniformly distributed in solid sphere of radius 1 angstrom. Taken literally, though, "perturb one of the particles by 1 angstrom in a random direction" is distributed on the surface of the sphere (particle is known to be exactly 1 angstrom from <center>). -The answer will absolutely depend on the temperature. (in a neighborhood of absolute zero, the final positions of the gas particles are very close to the initial positions.) -The answer also might depend on the exact starting configuration. While I think most configurations would end up ~50/50 chance after 20 seconds, there are definitely configurations that would be stably strongly on one side.

Nothing conclusive below, but things that might help: -Back-of-envelope calculation said the single uncertain particle has ~(10 million * sqrt(temp in K)) collisions /sec. -If I'm using MSD right (big if!) then at STP, particles move from initial position only by about 5 cm in 20 seconds (cover massive distance, but the brownian motion cancels in expectation.) -I think that at standard temp, this would be at roughly 1/50 standard pressure?

"we don't know if deceptive alignment is real at all (I maintain it isn't, on the mainline)."

You think it isn't a substantial risk of LLMs as they are trained today, or that it isn't a risk of any plausible training regime for any plausible deep learning system? (I would agree with the first, but not the second)

I agree in the narrow sense of different from bio-evolution, but I think it captures something tonally correct anyway.

Answer by Nathaniel MonsonJan 03, 20241-8

I like "evolveware" myself.

I'm not really sure how it ended up there--probably childhood teaching inducing that particular brain-structure? It's just something that was a fundamental part of who I understood myself to be, and how I interpreted my memories/experiences/sense-data. After I stopped believing in God, I definitely also stopped believing that I existed. Obviously, this-body-with-a-mind exists, but I had not identified myself as being that object previously--I had identified myself as the-spirit-inhabiting-this-body, and I no longer believed that existed.

This is why I added "for the first few". Let's not worry about the location, just say "there is a round cube" and "there is a teapot".

Before you can get to either of these axioms , you need some things like "there is a thing I'm going to call reality that it's worth trying to deal with" and "language has enough correspondence to reality to be useful". With those and some similar very low level base axioms in place (and depending on your definitions of round and cube and teapot), I agree that one or another of the axioms could reasonably be called more or less reasonable, rational, probable, etc.

I think when I believed in God, it was roughly third on the list? Certainly before usefulness of language. The first two were something like me existing in time, with a history and memories that had some accuracy, and sense-data being useful.

Answer by Nathaniel MonsonNov 24, 202320

I don't think I believe in God anymore--certainly not in the way I used to--but I think if you'd asked me 3 years ago, I would have said that I take it as axiomatic that God exists. If you have any kind of consistent epistemology, you need some base beliefs from which to draw the conclusions and one of mine was the existence of an entity that cared about me (and everyone on earth) on a personal level and was sufficiently more wise/intelligent/powerful/knowledgeable than me that I may as well think of it as infinitely so.

I think the religious people I know who've thought deeply about their epistemology take either the existence of God or the reliability of a sort of spiritual sensory modality as an axiom.

While I no longer believe in God, I don't think I had a perspective any less epistemically rational then than I do now. I don't think there's a way to use rationality to pick axioms, the process is inherently arational (for the first few, anyway).

That's fair. I guess I'm used to linkposts which are either full, or a short enough excerpt that I can immediately see they aren't full.

I really appreciated both the original linked post and this one. Thank you, you've been writing some great stuff recently.

One strategy I have, as someone who simultaneously would like to be truth-committed and also occasionally jokes or teases loved ones ("the cake you made is terrible! No one else should have any, I'll sacrifice my taste buds to save everyone!") is to have triggers for entering quaker-mode; if someone asks me a question involving "really" or "actually", I try to switch my demeanour to clearly sincere, and give a literally honest answer. I... hope? that having an explicit mode of truth this way blunts some of the negatives of frequently functioning as an actor.

Load More