Bella is meeting a psychotherapist, but they treat her fear as something irrational. This doesn't help, and only makes Bella more anxious. She feels like even her therapist doesn't understand her.
How would one find a therapist in their local area who's aware of what's going on in the EA/rat circles such that they wouldn't find statements about, say, x-risks as being schizophrenic/paranoid?
I am very interested in this, especially in the context of alignment research and solving not-yet-understood problems in general. Since I have no strong commitments this month (and was going to do something similar to this anyways), I will try this every day for the next two weeks and report back on how it goes (writing this comment as a commitment mechanism!)
...Have a large group of people attempt to practice problems from each domain, randomizing the order that they each tackle the problems in. (The ideal version of this takes a few months)
...
As part of eac
I had something like locality in mind when writing this shortform, the context being: [I'm in my room -> I notice itch -> I realize there's a mosquito somewhere in my room -> I deliberately pursue and kill the mosquito that I wouldn't have known existed without the itch]
But, again, this probably wouldn't amount to much selection pressure, partially due to the fact that the vast majority of mosquito population exists in places where such locality doesn't hold i.e. in an open environment.
Makes sense. I think we're using the terms differently in scope. By "DL paradigm" I meant to encompass the kind of stuff you mentioned (RL-directing-SS-target (active learning), online learning, different architecture, etc) because they really seemed like "engineering challenges" to me (despite them covering a broad space of algorithms) in the sense that capabilities researchers already seem to be working on & scaling them without facing any apparent blockers to further progress, i.e. in need of any "fundamental breakthroughs"—by which I was pointing more at paradigm shifts away from DL like, idk, symbolic learning.
But the evolutionary timescale at which mosquitos can adapt to avoid detection must be faster than that of humans adapting to find mosquitos itchy! Or so I thought - my current boring guess is that (1) mechanisms for the human body to detect foreign particles are fairly "broad", (2) the required adaptation from the mosquitos to evade them are not-way-too-simple, and (3) we just haven't put enough selection pressure to make such change happen.
To me, the fact that the human brain basically implements SSL+RL is very very strong evidence that the current DL paradigm (with a bit of "engineering" effort, but nothing like fundamental breakthroughs) will kinda just keep scaling until we reach point-of-no-return. Does this broadly look correct to people here? Would really appreciate other perspectives.
I mostly think “algorithms that involve both SSL and RL” is a much broader space of possible algorithms than you seem to think it is, and thus that there are parts of this broad space that require “fundamental breakthroughs” to access. For example, both AlexNet and differentiable rendering can be used to analyze images via supervised learning with gradient descent. But those two algorithms are very very different from each other! So there’s more to an algorithm than its update rule.
See also 2nd section of this comment, although I was emphasizing alignment-...
I have a slightly different takeaway. Yes techniques similar to current techniques will most likely lead to AGI but it's not literally 'just scaling LLMs'. The actual architecture of the brain is meaningfully different from what's being deployed right now. So different in one sense. On the other hand it's not like the brain does something completely different and proposals that are much closer to the brain architecture are in the literature (I won't name them here...). It's plausible that some variant on that will lead to true AGI. Pure hardware scal...
What are the errors in this essay? As I'm reading through the Brain-like AGI sequence I keep seeing this post being referenced (but this post says I should instead read the sequence!)
I would really like to have a single reference post of yours that contains the core ideas about phasic dopamine rather than the reference being the sequence posts (which is heavily dependent on a bunch of previous posts; also Post 5 and 6 feels more high-level than this one?)
Answering my own question, review / survey articles like https://arxiv.org/abs/1811.12560 seem like a pretty good intro.
Mildly surprised how some verbs/connectives barely play any role in conversations, even in technical ones. I just tried directed babbling with someone, and (I think?) I learned quite a lot about Israel-Pakistan relations with almost no stress coming from eg needing to make my sentences grammatically correct.
Example of (a small part of) my attempt to summarize my understanding of how Jews migrated in/out of Jerusalem over the course of history:
...They here *hand gesture on air*, enslaved out, they back, kicked out, and boom, they everywhere.
(audience nods, giv
Could you explain more what you mean by this?
My (completely amateur) understanding is that the "extra" semantic and syntactic structure of written and spoken language does two things.
One, it adds redundancy and reduces error. Simple example, gendered pronouns mean that when you hear "Have you seen Laurence? She didn't get much sleep last night." you have a chance to ask the speaker for clarification and catch if they had actually said "Laura" and you misheard.
Two, it can be used as a signal. The correct use of jargon is used by listeners or readers as a proxy for competence. Or many typos in your text will indicate to readers that you haven't put much effort into what you're saying.
Also, davidad's Open Agency Architecture is a very concrete example of what such a non-antisocial pivotal act that respects the preferences of various human representatives would look like (i.e. a pivotal process).
Perhaps not realistically feasible in its current form, yes, but davidad's proposal suggests that there might exist such a process, and we just have to keep searching for it.
Agree that current AI paradigm can be used to make significant progress in alignment research if used correctly. I'm thinking something like Cyborgism; leaving most of the "agency" to humans and leveraging prosaic models to boost researcher productivity which, being highly specialized in scope, wouldn't involve dangerous consequentialist cognition in the trained systems.
However, the problem is that this isn't what OpenAI is doing - iiuc, they're planning to build a full-on automated researcher that does alignment research end-to-end, for which orthonormal ...
Complaint with Pugh's real analysis textbook: He doesn't even define the limit of a function properly?!
It's implicitly defined together with the definition of continuity where , but in Chapter 3 when defining differentiability he implicitly switches the condition to without even mentioning it (nor the requirement that now needs to be an accumulation point!) While Pugh has its own benefits, coming from Terry Tao's analysis textbook backgrou...
Maybe you should email Pugh with the feedback? (I audited his honors analysis course in fall 2017; he seemed nice.)
As far as the frontier of analysis textbooks goes, I really like how Schröder Mathematical Analysis manages to be both rigorous and friendly: the early chapters patiently explain standard proof techniques (like the add-and-subtract triangle inequality gambit) to the novice who hasn't seen them before, but the punishing details of the subject are in no way simplified. (One wonders if the subtitle "A Concise Introduction" was intended ironically...
Any advice on reducing neck and shoulder pain while studying? For me that's my biggest blocker to being able to focus longer (especially for math, where I have to look down at my notes/book for a long period of time). I'm considering stuff like getting a standing desk or doing regular back/shoulder exercises. Would like to hear what everyone else's setups are.
I've used Pain Science in the past as a resource and highly, highly endorse it. Here is an article they have on neck pain.
Train skill of noticing tension and focus on it. Tends to dissolve. No that's not so satisfying but it works. Standing desk can help but it's just not that comfortable for most.
I still have lots of neck and shoulder tension, but the only thing I've found that can reliably lessen it is doing some hard work on a punching bag for about 20 minutes every day, especially hard straights and jabs with full extension.
Update: huh, nonstandard analysis is really cool. Not only are things much more intuitive (by using infinitesimals from hyperreals instead of using epsilon-delta formulation for everything), by the transfer principle all first order statements are equivalent between standard and nonstandard analysis!
Man, deviation arguments are so cool:
I used to try out near-random search on ideaspace, where I made a quick app that spat out 3~5 random words from a dictionary of interesting words/concepts that I curated, and I spent 5 minutes every day thinking very hard on whether anything interesting came out of those combinations.
Of course I knew random search on exponential space was futile, but I got a couple cool invention ideas (most of which turned out to already exist), like:
algebraic geometry in the infinite dimensions (algebraic geometric ... functional analysis?!) surely sounds like a challenge, damn.
gwern's take on a similar paper (Tinystories), in case anyone was wondering. Notable part for me:
...
Now, what would be really interesting is if they could go beyond the in-domain tasks and show something like meta-learning. That's supposed to be driven by the distribution and variety of Internet-scale datasets, and thus should not be elicited by densely sampling a domain like this.
So, it seems that scaling up isn't the only thing that matters, and data quality can be more important than data quantity or parameter count. (You hear that, gwern?)
Apparently someone didn't actually read my scaling hypothesis essay (specifically, the parts about why pretraining works and the varieties of blessings of scale). I have been pointing out for a long time that NNs are overparameterized and almost all training data is useless (which is a big part of why RL will be important, because RL lets you make the right data, or see meta-learning or data...
I wonder if the following is possible to study textbooks more efficiently using LLMs:
When I study textbooks, I spend a significant amount of time improving my mental autocompletion, like being able to familiari...
What's a good technical introduction to Decision Theory and Game Theory for alignment researchers? I'm guessing standard undergrad textbooks don't include, say, content about logical decision theory. I've mostly been reading posts on LW but as with most stuff here they feel more like self-contained blog posts (rather than textbooks that build on top of a common context) so I was wondering if there was anything like a canonical resource providing a unified technical / math-y perspective on the whole subject.
The MIRI Research Guide recommends An Introduction to Decision Theory and Game Theory: An Introduction. I have read neither and am simply relaying the recommendation.
There's still some pressure, though. If the bites were permanently not itchy, then I may have not noticed that the mosquitos were in my room in the first place, and consequently would less likely pursue them directly. I guess that's just not enough.
There’s also positive selection for itchiness. Mosquito spit contains dozens of carefully evolved proteins. We don’t know what they all are, but some of them are anticoagulants and anesthetics. Presumably they wouldn’t be there if they didn’t have a purpose. And your body, when it detects these foreign proteins, mounts a protective reaction, causing redness, swelling, and itching. IIRC, that reaction does a good job of killing any viruses that came in with the mosquito saliva. We’ve evolved to have that reaction. T...
Why haven't mosquitos evolved to be less itchy? Is there just not enough selection pressure posed by humans yet? (yes probably) Or are they evolving towards that direction? (they of course already evolved towards being less itchy while biting, but not enough to make that lack-of-itch permanent)
this is a request for help i've been trying and failing to catch this one for god knows how long plz halp
tbh would be somewhat content coexisting with them (at the level of houseflies) as long as they evolved the itch and high-pitch noise away, modulo disease risk considerations.
I believe mosquitos do inject something to suppress your reaction to them, which is why you don't notice bug bites until long after the bug is gone. There's no reproductive advantage to the mosquito to extending that indefinitely.
The reason mosquito bites itch is because they are injecting saliva into your skin. Saliva contains mosquito antigens, foreign particles that your body has evolved to attack with an inflammatory immune response that causes itching. The compound histamine is a key signaling molecule used by your body to drive this reaction.
In order for the mosquito to avoid provoking this reaction, they would either have to avoid leaving compounds inside of your body, or mutate those compounds so that they do not provoke an immune response. The human immune system is an adv...
Because they have no reproductive advantage to being less itchy. You can kill them while they’re feeding, which is why they put lots of evolutionary effort into not being noticed. (They have an anesthetic in their saliva so you are unlikely to notice the bite.) By the time you develop the itchy bump, they’ve flown away and you can’t kill them.
Having lived ~19 years, I can distinctly remember around 5~6 times when I explicitly noticed myself experiencing totally new qualia with my inner monologue going “oh wow! I didn't know this dimension of qualia was a thing.” examples:
Sunlight scattered by the atmosphere on cloudless mornings during the hour before sunrise inspires a subtle feeling ("this is cool, maybe even exciting") that I never noticed till I started intentionally exposing myself to it for health reasons (specifically, making it easier to fall asleep 18 hours later).
More precisely, I might or might not have noticed the feeling, but if I did notice it, I quickly forgot about it because I had no idea how to reproduce it.
I have to get away from artificial light (streetlamps) (and from direct (yellow) sunlight) for the ...
I observed new visual qualia of colors while using some light machine.
Also, when I first came to Italy, I have a feeling as if the whole rainbow of color qualia changed
i absolutely hate bureaucracy, dumb forms, stupid websites etc. like, I almost had a literal breakdown trying to install Minecraft recently (and eventually failed). God.
I think what's so crushing about it, is that it reminds me that the wrong people are designing things, and that they wont allow them to be fixed, and I can only find solace in thinking that the inefficiency of their designs is also a sign that they can be defeated.
This shortform just reminded me to buy a CO2 sensor and, holy shit, turns out my room is at ~1500ppm.
While it's too soon to say for sure, this may actually be the underlying reason for a bunch of problems I noticed myself having primarily in my room (insomnia, inability to focus or read, high irritability, etc).
Although I always suspected bad air quality, it really is something to actually see the number with your own eyes, wow. Thank you so, so much for posting about this!!
I am so glad it helped. :)))
It is maddening otherwise; focus is my most valuable good, and the reasons for it failing can be so varied and hard to pinpoint. The very air your breathe undetectably fucking with you is just awful.
I also have the insomnia and irritability issue, it is insane. I've had instances where me and my girlfriend are snapping at each other like annoyed cats, repeatedly apologising and yet then snapping again over absolutely nothing (who ate more of the protein bars, why there is a cat toy on the floor, total nonsense), both of us upset...
One of the rare insightful lessons from high school: Don't set your AC to the minimum temperature even if it's really hot, just set it to where you want it to be.
It's not like the air released gets colder with lower target temperature, because most ACs (according to my teacher, I haven't checked lol) are just a simple control system that turns itself on/off around the target temperature, meaning the time it takes to reach a certain temperature X is independent of the target temperature (as long it's lower than X)
... which is embarrassingly obvious in hindsight.
Well is he is right about some ACs being simple on/off units.
But there also exists units than can change cycle speed, its basically the same thing except the motor driving the compression cycle can vary in speed.
In case you where wondering, they are called inverters. And when buying new today, you really should get an inverter (efficiency).
God, I wish real analysis was at least half as elegant as any other math subject — way too much pathological examples that I can't care less about. I've heard some good things about constructivism though, hopefully analysis is done better there.
As a general reflection on undergraduate mathematics imho there is way too much emphasis on real analysis. Yes, knowing how to be rigorous is important, being aware of pathological counterexample is importanting, and real analysis is used all over the place. But there is so much more to learn in mathematics than real analysis and the focus on minor technical issues here is often a distraction to developing a broad & deep mathematical background.
For most mathematicians (and scientists using serious math) real analysis is a only a small part of the...
Update: huh, nonstandard analysis is really cool. Not only are things much more intuitive (by using infinitesimals from hyperreals instead of using epsilon-delta formulation for everything), by the transfer principle all first order statements are equivalent between standard and nonstandard analysis!
Yeah, real analysis sucks. But you have to go through it to get to delightful stuff— I particularly love harmonic and functional analysis. Real analysis is just a bunch of pathological cases and technical persnicketiness that you need to have to keep you from steering over a cliff when you get to the more advanced stuff. I’ve encountered some other subjects that have the same feeling to them. For example, measure-theoretic probability is a dry technical subject that you need to get through before you get the fun of stochastic differ...
I think the point of having an explicit human-legible world model / simulation is to make desideratas formally verifiable, which I don't think would be possible with a blackbox system (like LLM w/ wrappers).
Also important to note:
The phenomenon you call by names like "goals" or "agency" is one possible shadow of the deep structure of optimization - roughly, preimaging outcomes onto choices by reversing a complicated transformation.
i.e. if we were to pin-down something we actually care about, that'd be "a system exhibiting consequentialism", because those are the kind of systems that will end up shaping our lightcone and more. Consequentialism is convergent in an optimization process, i.e. the "deep structure of optimization". Terms like "g...
re: reducing magic and putting bounds, I'm reminded of Cleo Nardo's Hodge Podge Alignment proposal.
moments of microscopic fun encountered while studying/researching:
That means the problem is inherently unsolvable by iteration. "See what goes wrong and fix it" auto-fails if The Client cannot tell that anything is wrong.
Not at all meant to be a general solution to this problem, but I think that a specific case where we could turn this into something iterable is by using historic examples of scientific breakthroughs - consider past breakthroughs to a problem where the solution (in hindsight) is overdetermined, train the AI on data filtered by date, and The Client evaluates the AI solely based on how close the AI approach...
Therefore, the longer you interact with the LLM, eventually the LLM will have collapsed into a waluigi. All the LLM needs is a single line of dialogue to trigger the collapse.
Hm, what if we do the opposite? i.e. Prompt chatbob starting as a pro-croissant simulacrum, and then proceed to collapse the superposition into the anti-croissant simulacrum using a single line of dialogue; behold, we have created a stable Luigi!
I can see how this is more difficult for desirable traits rather than their opposite because fiction usually has the structure of an antagoni...
The actual theorem is specific to classical mechanics, but a similar principle seems to hold generally.
Interesting, would you mind elaborating on this further?
Just noticing that the negation of a statement exists is enough to make meaningful updates.
e.g. I used to (implicitly) think "Chatbot Romance is weird" without having evaluated anything in-depth about the subject (and consequently didn't have any strong opinions about it)—probably as a result of some underlying cached belief.
But after seeing this post, just reading the title was enough to make me go (1) "Oh! I just realized it is perfectly possible to argue in favor of Chatbot Romance ... my belief on this subject must be a cached belief!" (2) hence ...
(Note: This was a post, but in retrospect was probably better to be posted as a shortform)
(Epistemic Status: 20-minute worth of thinking, haven't done any builder/breaker on this yet although I plan to, and would welcome any attempts in the comment)
There were various notions/frames of optimization floating around, and I tried my best to distill them:
One thing I imagine might be useful even in small training regimes would be to train on tasks where the only possible solution necessarily involves a search procedure, i.e. "search-y tasks"
For example, it's plausible that simple heuristics aren't sufficient to get you to superhuman-level on tasks like Chess or Go, so a superhuman RL performance on these tasks would be a fairly good evidence that the model already has an internal search process.
But one problem with Chess or Go would be that the objective is fixed, i.e. the game rules. So perhaps one way to ...
Update: I'm trying to upskill mechanistic interpretability, and training a Gradient Hacker Enzyme seems like a fairly good project just to get myself started.
I don't think this project would be highly valuable in and of itself (although I would definitely learn a lot!), so one failure mode I need to avoid is ending up investing too much of my time in this idea. I'll probably spend a total of ~1 week working on it.
Especially because we’re working with toy models that ostensibly fit the description of an optimizer, we may end up with a model that mechanistically doesn’t have an explicit notion of objective.
I think this is very likely to be the default for most toy models one trains RL on. In my model of agent value formation (which looks very much like this post), explicit representation of objectives is useful inasmuch the model already has some sort of internal "optimizer" or search process. And before that, simple "heuristics" (or shards) should suffice—especially in small training regimes.
Just wanted to comment that this is an absolutely amazing resource and have saved me a ton of time trying to get into this field & better understand several of the core papers. Thank you so much for writing this!
Quick thoughts on my plans:
Different GPS instances aren't exactly "subagents", they're more like planning processes tasked to solve a given problem.
You're right that GPS-instances (nice term btw) aren't necessarily subagents—I missed that your GPS formalization does argmin over WM variable for a specific t, not all t, which means it doesn't have to care about controlling variables at all time.
With that said ...
Wait, so PreDCA solves inner-misalignment by just ... assuming that "we will later have an ideal learning theory with provable guarantees"?
By the claim "PreDCA solves inner-misalignment" as implied by the original protocol / distillation posts, I thought it somehow overcame the core problem of demons-from-imperfect-search. But it seems like the protocol already starts with an assumption of "demons-from-imperfect-search won't be a problem because of amazing theory" and instead tackles a special instantiation of inner-misalignment that happens because of the...
Okay, more questions incoming: "Why would GPS be okay with value-compilation, when its expected outcome is to not satisfy in-distribution context behaviors through big-brain moves?"
If I understood correctly (can be skipped; not relevant to my argument, which starts after the bullet points):
(Quality: Low, only read when you have nothing better to do—also not much citing)
30-minute high-LLM-temp stream-of-consciousness on "How do we make mechanistic interpretability work for non-transformers, or just any architectures?"
My argument is that they wouldn't actually be a good cross-context approximation of U; in part because of gradient starvation.
Ah bad phrasing—where you quoted me (arguments against part) I meant to say:
'Symmetry' implies 'redundant coordinate' implies 'cyclic coordinates in your Lagrangian / Hamiltonian' implies 'conservation of conjugate momentum'
And because the action principle (where the true system trajectory extremizes your action, i.e. integral of Lagrangian) works in various dynamical systems, the above argument works in non-physical dynamical systems.
Thus conserved quantities usually exist in a given dynamical system.
mmm, but why does the action principle hold in such a wide variety of systems though? (like how you get entropy by postulating something to be maximized in an equilibrium setting)