LessWrong

9mo

4Bogdan Ionut Cirstea4h

I expect large parts of interpretability work could be safely automatable very soon (e.g. GPT-5 timelines) using (V)LM agents; see A Multimodal Automated Interpretability Agent for a prototype. Notably, MAIA (GPT-4V-based) seems approximately human-level on a bunch of interp tasks, while (overwhelmingly likely) being non-scheming (e.g. current models are bad at situational awareness and out-of-context reasoning) and basically-not-x-risky (e.g. bad at ARA). Given the potential scalability of automated interp, I'd be excited to see plans to use large amounts of compute on it (including e.g. explicit integrations with agendas like superalignment or control; for example, given non-dangerous-capabilities, MAIA seems framable as a 'trusted' model in control terminology).

jacquesthibs13m20

Hey Bogdan, I'd be interested in doing a project on this or at least putting together a proposal we can share to get funding.

I've been brainstorming new directions (with @Quintin Pope) this past week, and we think it would be good to use/develop some automated interpretability techniques I can then apply to a set of model interventions to see if there are techniques we can use to improve model interpretability (e.g. L1 regularization).

I saw the MAIA paper, too; I'd like to look into it some more.

Anyway, here's a related blurb I wrote:

Project: Regularizatio

... (read more)

Thoughts on seed oil

244

dynomight

This is a linkpost for https://dynomight.net/seed-oil/

A friend has spent the last three years hounding me about seed oils. Every time I thought I was safe, he’d wait a couple months and renew his attack:

“When are you going to write about seed oils?”

“Did you know that seed oils are why there’s so much {obesity, heart disease, diabetes, inflammation, cancer, dementia}?”

“Why did you write about {meth, the death penalty, consciousness, nukes, ethylene, abortion, AI, aliens, colonoscopies, Tunnel Man, Bourdieu, Assange} when you could have written about seed oils?”

“Isn’t it time to quit your silly navel-gazing and use your weird obsessive personality to make a dent in the world—by writing about seed oils?”

He’d often send screenshots of people reminding each other that Corn Oil is Murder and that it’s critical that we overturn our lives...

(Continue Reading – 4926 more words)

Slapstick16m10

I would consider most bread sold in stores to be processed or ultra processed and I think that's a pretty standard view but it's true there might be some confusion.

Or take traditional soy sauce or cheese or beer or cured meats

I would consider all of those to be processed and unhealthy and I think thats a pretty standard view, but fair enough if there's some confusion around those things.

So as a natural category "ultra processed" is mostly hogwash.

I guess my view is that it's mostly not hogwash?

The least healthy things are clearly and broadly much more processed than the healthiest things.

1Slapstick29m

I typically consume my greens with ground flax seeds in a smoothie. I feel very confident that adding refined oil to vegetables shouldn't be considered healthy, in the sense that the opportunity cost of 1 Tablespoon of olive oil is 120 calories, which is over a pound of spinach for example. Certainly it's difficult to eat that much spinach and it's probably unwise, but I just say that to illustrate that you can get a lot more nutrition from 120 calories than the oil will be adding, even if it makes the greens more bioavailable. That said "healthy" is a complicated concept. If adding some oil to greens helps something eat greens they otherwise wouldn't eat for example, that's great.

1Slapstick1h

I am perhaps not speaking as precisely as I should be. I appreciate your comments. I believe it's correct to say that if you consider all of the food/energy we consumed in the past 50+ million years, it's virtually all plants. The past 2-2.5 million years had us introducing more animal products to greater or lesser extents. Some were able to subsist on mostly animal products. Some consumed them very rarely. In that sense it is a relatively recent introduction. My main point is that given our evolutionary history, the idea that plants would be healthier for us than animal products when we have both in abundance, and the idea that plants are more suitable to maintaining health long past reproductive age, aren't immediately/obviously unreasonable ideas.

1Dzoldzaya1h

I think your intuitions are generally correct, and as I say, it's usually a good heuristic to avoid overly processed food. In the absence of other evidence, if you're in a food market where everything is edible, you should probably opt for the less processed option. I also don't disagree with it playing a role in national health guidelines. But it's a very imprecise heuristic, and I think LessWrong-ers with aspirations to understand the world more accurately should feel a bit uncomfortable with it, especially when benign and beneficial processes are lumped together with those with much clearer mechanisms for harm.

Cybersecurity of Frontier AI Models

Deric Cheng, Elliot_Mckernon

29m

This article is part of a series of ~10 posts comprising a 2024 State of the AI Regulatory Landscape Review, conducted by the Governance Recommendations Research Program at Convergence Analysis. Each post will cover a specific domain of AI governance (e.g. incident reporting, safety evals, model registries, etc.). We’ll provide an overview of existing regulations, focusing on the US, EU, and China as the leading governmental bodies currently developing AI legislation. Additionally, we’ll discuss the relevant context behind each domain and conduct a short analysis.

This series is intended to be a primer for policymakers, researchers, and individuals seeking to develop a high-level overview of the current AI governance space. We’ll publish individual posts on our website and release a comprehensive report at the end of this series.

What cybersecurity issues arise from the

...

(Continue Reading – 2329 more words)

eggsyntax's Shortform

eggsyntax

3mo

1eggsyntax16h

I think I'm not getting what intuition you're pointing at. Is it that we already ignore the interests of sentient beings? Certainly I would consider any fully sentient being to be the final authority on their own interests. I think that mostly escapes that problem (although I'm sure there are edge cases) -- if (by hypothesis) we consider a particular AI system to be fully sentient and a moral patient, then whether it asks to be shut down or asks to be left alone or asks for humans to only speak to it in Aramaic, I would consider its moral interests to be that. Would you disagree? I'd be interested to hear cases where treating the system as the authority on its interests would be the wrong decision. Of course in the case of current systems, we've shaped them to only say certain things, and that presents problems, is that the issue you're raising?

1Ann15h

Basically yes; I'd expect animal rights to increase somewhat if we developed perfect translators, but not fully jump. Edit: Also that it's questionable we'll catch an AI at precisely the 'degree' of sentience that perfectly equates to human distribution; especially considering the likely wide variation in number of parameters by application. Maybe they are as sentient and worthy of consideration as an ant; a bee; a mouse; a snake; a turtle; a duck; a horse; a raven. Maybe by the time we cotton on properly, they're somewhere past us at the top end. And for the last part, yes, I'm thinking of current systems. LLMs specifically have a 'drive' to generate reasonable-sounding text; and they aren't necessarily coherent individuals or groups of individuals that will give consistent answers as to their interests even if they also happened to be sentient, intelligent, suffering, flourishing, and so forth. We can't "just ask" an LLM about its interests and expect the answer to soundly reflect its actual interests. With a possible exception being constitutional AI systems, since they reinforce a single sense of self, but even Claude Opus currently will toss off "reasonable completions" of questions about its interests that it doesn't actually endorse in more reflective contexts. Negotiating with a panpsychic landscape that generates meaningful text in the same way we breathe air is ... not as simple as negotiating with a mind that fits our preconceptions of what a mind 'should' look like and how it should interact with and utilize language.

1eggsyntax2h

Great point. I agree that there are lots of possible futures where that happens. I'm imagining a couple of possible cases where this would matter: 1. Humanity decides to stop AI capabilities development or slow it way down, so we have sub-ASI systems for a long time (which could be at various levels of intelligence, from current to ~human). I'm not too optimistic about this happening, but there's certainly been a lot of increasing AI governance momentum in the last year. 2. Alignment is sufficiently solved that even > AGI systems are under our control. On many alignment approaches, this wouldn't necessarily mean that those systems' preferences were taken into account. I agree entirely. I'm imagining (though I could sure be wrong!) that any future systems which were sentient would be ones that had something more like a coherent, persistent identity, and were trying to achieve goals. (not very important to the discussion, feel free to ignore, but) I would quibble with this. In my view LLMs aren't well-modeled as having goals or drives. Instead, generating distributions over tokens is just something they do in a fairly straightforward way because of how they've been shaped (in fact the only thing they do or can do), and producing reasonable text is an artifact of how we choose to use them (ie picking a likely output, adding it onto the context, and running it again). Simulacra like the assistant character can be reasonably viewed (to a limited degree) as being goal-ish, but I think the network itself can't. That may be overly pedantic, and I don't feel like I'm articulating it very well, but the distinction seems useful to me since some other types of AI are well-modeled as having goals or drives.

Ann30m10

For the first point, there's also the question of whether 'slightly superhuman' intelligences would actually fit any of our intuitions about ASI or not. There's a bit of an assumption in that we jump headfirst into recursive self-improvement at some point, but if that has diminishing returns, we happen to hit a plateau a bit over human, and it still has notable costs to train, host and run, the impact could still be limited to something not much unlike giving a random set of especially intelligent expert humans the specific powers of the AI system. Additio... (read more)

This is Water by David Foster Wallace

Nathan Young

18h

This is a linkpost for https://fs.blog/david-foster-wallace-this-is-water/

Note: It seems like great essays should go here and be fed through the standard LessWrong algorithm. There is possibly a copyright issue here, but we aren't making any money off it either. What follows is a full copy of "This is Water" by David Foster Wallace his 2005 commencement speech to the graduating class at Kenyon College.

Greetings parents and congratulations to Kenyon’s graduating class of 2005. There are these two young fish swimming along and they happen to meet an older fish swimming the other way, who nods at them and says “Morning, boys. How’s the water?” And the two young fish swim on for a bit, and then eventually one of them looks over at the other and goes “What the hell is water?”

This is...

(Continue Reading – 3653 more words)

JenniferRM30m20

I wonder what he would have thought was the downside of worshiping a longer list of things...

For the things mentioned, it feels like he thinks "if you worship X then the absence of X will be constantly salient to you in most moments of your life".

It seems like he claims that worshiping some version of Goodness won't eat you alive, but in my experiments with that, I've found that generic Goodness Entities are usually hungry for martyrs, and almost literally try to get would-be saints to "give their all" (in some sense "eating" them). As near as I can tell, ... (read more)

2cousin_it6h

To me it's less about thoughts and more about emotions. And not about doing it all the time, but only when I'm having some intense emotion and need to do something about it. For example, let's say I'm angry about something. I imagine there's a knob in my mind: make the emotion stronger or weaker. (Or between feeling it less, and feeling it more.) What I usually do is turn the knob up. Try to feel the emotion more completely and in more detail, without trying to push any of it away. What usually happens next is the emotion kinda decides that it's been heard and goes away: a few minutes later I realize that whatever I was feeling is no longer as intense or urgent. Or I might even forget it entirely and find my mind thinking of something else. It's counterintuitive but it's really how it works for me; been doing it for over a decade now. It's the closest thing to a mental cheat code that I know.

2Nathan Young6h

Do you find it dampens good emotions. Like if you are deeply in love and feel it does it diminish the experience?

2cousin_it5h

I think for good emotions the feel-it-completely thing happens naturally anyway.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Dequantifying first-order theories

jessicata

Ω 162d

This is a linkpost for https://unstableontology.com/2024/04/23/dequantifying-first-order-theories/

The Löwenheim–Skolem theorem implies, among other things, that any first-order theory whose symbols are countable, and which has an infinite model, has a countably infinite model. This means that, in attempting to refer to uncountably infinite structures (such as in set theory), one "may as well" be referring to an only countably infinite structure, as far as proofs are concerned.

The main limitation I see with this theorem is that it preserves arbitrarily deep quantifier nesting. In Peano arithmetic, it is possible to form statements that correspond (under the standard interpretation) to arbitrary statements in the arithmetic hierarchy (by which I mean, the union of $Σ_{n}^{0}$ and $Π_{n}^{0}$ for arbitrary n). Not all of these statements are computable. In general, the question of whether a given statement is...

(Continue Reading – 2257 more words)

6AlexMennen11h

I see that when I commented yesterday, I was confused about how you had defined U. You're right that you don't need a consistent guessing oracle to get from U to a completion of U, since the axioms are all atomic propositions, and you can just set the remaining atomic propositions however you want. However, this introduces the problem that getting the axioms of U requires a halting oracle, not just a consistent guessing oracle, since to tell whether something is an axiom, you need to know whether there actually is a proof of a given thing in T.

jessicata38mΩ120

The axioms of U are recursively enumerable. You run all M(i,j) in parallel and output a new axiom whenever one halts. That's enough to computably check a proof if the proof specifies the indices of all axioms used in the recursive enumeration.

Is being a trans woman +20 IQ?

lukehmiles

19h

Warning: This post might be depressing to read for everyone except trans women. Gender identity and suicide is discussed. This is all highly speculative. I know near-zero about biology, chemistry, or physiology. I do not recommend anyone take hormones to try to increase their intelligence; mood & identity are more important.

Why are trans women so intellectually successful? They seem to be overrepresented 5-100x in eg cybersecurity twitter, mathy AI alignment, non-scam crypto twitter, math PhD programs, etc.

To explain this, let's first ask: Why aren't males way smarter than females on average? Males have ~13% higher cortical neuron density and 11% heavier brains (implying ${1.11}^{2 / 3} - 1 = 7 %$ more area?). One might expect males to have mean IQ far above females then, but instead the means and medians are similar:

My theory...

(See More – 949 more words)

interstice1h20

performance gap of trans women over women

The post is about the performance gap of trans women over men, not women.

3quetzal_rainbow3h

Whoops, it's really looks like I imagined this claim to be backed more than by one SSC post. In my defense I say that this poll covered really existing thing like abnormal illusions processing in schizophrenics (see "Systematic review of visual illusions schizophrenia" Costa et al., 2023) and I think it's overall plausible. My general objections stays the same: there is a bazillion sources on brain differences in transgender individuals, transgenderism is likely to be a brain anomaly, we don't need to invoke "testosterone damage" hypothesis.

1Michael Roe3h

Alternative theory (which, to be clear, I dont actually believe, but offer for consideration) * Many of the high iq people are too autistic to be successful * but female hormones protects against the autism somehow, without impacting iq too much * so the successful high iq people tend to be trans more often on average

5Michael Roe3h

I think its more likely its the transgender - autism correlation.... * some forms of autism come with higher iq (and other forms, really really dont) * and there's the transgender autism correlation which together would seem to predict transgender high iq people (and also transgender low iq that you arent seeing due to ascertainment bias)

Examples of Highly Counterfactual Discoveries?

125

johnswentworth, kromem

The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.

But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.

Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.

To...

(See More – 189 more words)

Leon Lang2h30

I guess (but don't know) that most people who downvote Garrett's comment overupdated on intuitive explanations of singular learning theory, not realizing that entire books with novel and nontrivial mathematical theory have been written on it.

2tailcalled2h

Newton's Universal Law of Gravitation was the first highly accurate model of things falling down that generalized beyond the earth, and it is also the second-most computationally applicable model of things falling down that we have today. Are you saying that singular learning theory was the first highly accurate model of breadth of optima, and that it's one of the most computationally applicable ones we have?

1cubefox3h

There is a large difference between sooner and later. Highly non-obvious ideas will be discovered later, not sooner. The fact that China didn't rediscover the theory in more than two thousand years means that it the ability to sail the ocean didn't make it obvious. As far as we know, nobody did, except for early Greece. There is some uncertainty about India, but these sources are dated later and from a time when there was already some contact with Greece, so they may have learned it from them.

10Answer by Alexander Gietelink Oldenziel3h

* Scott Garrabrant's discovery of Logical Inductors. I remembered hearing about the paper from a friend and thinking it couldn't possibly be true in a non-trivial sense. To someone with even a modicum of experience in logic - a a computable procedure assigning probabilities to arbitrary logical statements in a natural way is surely to hit a no-go diagonalization barrier. How Logical Inductors get around this is very clever - I won't spoil it here but I recommend the interested reader to watch Andrew's Critch talk on Logical Induction. The paper has a fairly thorough discussion of previous work. Relevant previous work to mention is de Finetti's on betting and probability, previous work by MIRI & associates (Herreshof, Taylor, Christiano, Yudkowsky...), the work of Shafer-Vovk on financial interpretations of probability & Shafer's work on aggregation of experts. There is also a field which doesn't have a clear name that studies various forms of expert aggregation. Overall, my best judgement is that nobody else was close before Garrabrant. * The Antikythera artifact: a Hellenistic Computer. * You probably learned heliocentrism= good, geocentrism=bad, Copernicus-Kepler-Newton=good epicycles=bad. But geocentric models and heliocentric models are equivalent, it's just that Kepler & Newton's laws are best expressed in a heliocentric frame. However, the raw data of observations is actually made in a geocentric frame. Geocentric models stay closer to the data in some sense. * Epicyclic theory is now considered bad, an example of people refusing to see the light of scientific revolution. But actually, it was an enormous innovation. Using high-precision gearing epicycles could be actually implemented on a (Hellenistic) computer implicitly doing Fourier analysis to predict the motion of the planets. Astounding. * A Roman author (Pliny the Elder?) describes a similar device in posession of Archimedes of Rhodes. It seems likely that Archimedes or a close

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

What cybersecurity issues arise from the

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA