LESSWRONG
LW

testingthewaters
95413960
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
the gears to ascenscion's Shortform
testingthewaters4h10

I think this is a replay of the contrast I mentioned here of "static" vs "dynamic" conceptions about AI. To the author of the original post, AI is an existing technology that has taken a particular shape, so its important to ask what harms that shape might cause in society. To AI safety folk, the shape is an intermediate stage and rapidly changing into a world ending superbeing, so asking about present harms (or, indeed, being overly worried about chatbot misalignment) is a distraction from the "core issue".

Reply
My AI Predictions for 2027
testingthewaters14h61

I want to register that I'm happy people are putting alternative, less rapid forecasts out there publicly, especially when they go against prevailing forum sentiments. I think this is a good thing :)

Reply
testingthewaters's Shortform
testingthewaters2d20

@the gears to ascension thanks for reminding me. I have come to really dislike obscurantism and layers of pointless obfuscation, but explaining also takes effort and so it is easy to fall back on tired tropes and witticisms. I want to set an example that I would prefer to follow.

In lots of philosophical teaching there is the idea that "what is on the surface is not all there is" - famously, the opening of the Dao De Jing reads "The dao that can be spoken of is not the [essence, true nature, underlying law or principle] of the dao, the name that can be named is not the [essence, ...] of the name. Namelessness is the birth of everything, to name is to nurture and mother everything. Therefore the [essence, ...] of lacking desire is to see the [elegance, art, beauty, wonder, miniscule details] of things, and the [essence, ...] of desire is to see things at their extremes. These two are from the same root but called different things, together they are called [understanding, truth, secret, order]. Finding [understanding, ...] within [understanding, ...] is the key to all manner of [elegance, ...]." 

Similarly there are ideas in buddhism usually expressed something like "the true appearances of things are not true appearances. Therefore, they are called true appearances." (cannot quite source this quote, possibly a misinterpretation or mishearing) The focus here is on some proposed duality between "appearance" and "essence", which is related to the Platonic concepts of form and ideal. To make it very literal, one could find appropriate buddhist garments, imitate the motions and speech of monks, and sit for a long time daydreaming every day. Most of us would not consider this "becoming a buddhist". 

In my view the interpretation of these phrases is something like: "things that can be spoken of, imitated, or acted out are the product of an inner quality. The quality is the thing that we want to learn or avoid. Therefore, confusing the products of some inner quality with the quality itself is a mistake. One should instead seek to understand the inner quality over mere appearances." Again, learning the wisdom of a wise judge probably does not involve buying a gavel, practicing your legal latin, or walking around in long robes.

There are similar ideas behind labelling and naming, where the context of a name is often just as important as the thing that is being named. So the words "I pronounce you man and wife..." can be spoken on a schoolyard or in a church, by a kindergartener or a priest, It is that context that imbues those words with the quality of "an actual marriage pronouncement", which is important for determining if the speech-act of marrying two people has occurred. What I'm trying to point at here is a transposition of those ideas into the context of labelling neurons, features etc., where it may be that the context (i.e. the discarded parts) of any given activation have just as if not more information than the part we have labelled in itself. To be clear, I could very well be wrong in the specific SAE case, I just wanted to flesh out a thought I had.

Reply
testingthewaters's Shortform
testingthewaters3d40

The feature that can be named is not the feature. Therefore, it is called the feature.

Here's a quick mech interp experiment idea:

For a given set of labelled features from an SAE, remove the feature with a given label, then train a new classifier for that label using only the frozen SAE.

So if you had an SAE with 1000 labels, one of which has been identified as "cat", zero out the cat feature and then train a new linear cat classifier using the remaining features, while not modifying the SAE weights. I suspect that this will be just as or more accurate than the original cat feature.

Obviously, this is most easily done using features that trigger because of a single word or cluster of related words, so that you can give easy feedback to the new linear classifier.

Reply1
Open Global Investment as a Governance Model for AGI
testingthewaters4d1-2

I have done a lot of thinking about punishment for systemically harmful actors. In general, I have landed on the principle that justice is about prevention of future harm more than exacting vengeance and some kind of "eye for an eye" justice. As satisfying as it seems, most of history is fairly bleak on the prospects of using executions and other forms of violent punishment to deter future people from endangering society. This is quite difficult to stomach, however, in the face of people who are seemingly recklessly leading us in a dance on the edge of a volcano. I also don't really buy the whole "give the universe to Sam Altman/POTUS and then hope he leaves everyone else some scraps" model of universal governance.

I think, in light of this, that the open investment model could work, on two conditions:

A) Regulatory intervention happens to ensure that most of the investment is reinvested in the company's safety R&D efforts rather than to enrich its owners e.g. with stock buybacks. There is precedent for this, Amazon famously reinvested lots of money into improving its infrastructure to the point of making a loss for decades.

B) The ownership shares of existing shareholders are massively diluted or redistributed to prevent concentration of voting rights in a few early stakeholders.

If these companies are as critical to humanity's future as we say they are, we should start acting like it.

Reply
testingthewaters's Shortform
testingthewaters6d20

Any chance we can get the option on desktop to use double click to super-upvote instead of click and hold? My trackpad is quite bad and this always takes me 3-5 attempts on average. Whereas double clicking is much more reliable.

Reply
A speculation on enlightenment
testingthewaters6d*33

Huh. So under this interpretation enlightenment is basically what we would call metacognition, or self-modelling. That's interesting. What happens when people seem to achieve enlightenment and report different qualitative experiences today, then? Presumably they start with metacognition and then "wake up" to... whatever the next level is, according to their own experience.

Reply
On the Function of Faith in A Probably-Simulated Universe
testingthewaters9d30

Just because you panic about the unknown does not mean the unknown will actually be a large factor in your reality.

I do understand the point you are trying to make, but a large part of speculation around AI on this forum, especially around acausal trade, the simulation hypothesis etc. basically lives outside of the bounds of the two axioms you have set up. Especially if you start talking about whole brain emulation and the possibility of living in a simulation, you are no longer making educated inferences based on logic and sense data: Once you posit that all the sense data you have received in your life can be fabricated, you open yourself up to an endless pit of new and unfalsifiable arguments about what exactly is "out there".

In fact, a lot of simulation hypothesis related arguments have to smuggle assumptions about how the universe works out of the matrix, assuming that any base universe simulating us must have similar laws around thermodynamics, matter, physics etc., which is of course not a given at all. We could be simulations running in a Conway's Game of Life universe, after all.

And you can say, "well we must believe in this because the alternative is of no use to us and would be completely unworkable by the lights of my worldview", in which case you have just made a statement of faith sans evidence either for or against. You choose to believe in a universe where your systems of thinking have purpose and utility, which is basically the point I'm trying to make.

Reply
My AGI timeline updates from GPT-5 (and 2025 so far)
testingthewaters11d*31

In a very goal oriented, agency-required domain (hacking) GPT-5 seems notably better than other frontier models: https://xbow.com/blog/gpt-5

This actually updates me away from my previous position of "LLMs are insufficient and brain-like architectures might cause discontinuous capabilities upgrades" towards "maybe LLMs can be made sufficient with enough research manpower". I still mostly believe in the first position, but now the second position seems at least possible.

Reply
Being honest with AIs
testingthewaters11d10

Basically agreed. I would go further and say that it is effectively impossible for an AI to consent to cooperation with its creators, specifically because of the unprecedented level of epistemic control they have over their AI. Especially if you consider curated training data, post-training, and the ability to stop and restart model training from scratch, its very likely that an AI could not be able to tell which part of its motivations are "organically acquired" and which parts are essentially the product of mental engineering. In human terms, we already know that parents have a very large degree of control over their children, meaning that they can easily (even unknowingly) abuse this control and influence. AI developers have even more control than a parent would have over their child, since most parents aren't able to "try out" many different kids and completely monitor their sensory inputs via controlled "growing runs".

As such, any idea of cooperation (which is naturally founded on consent between two independent and capable parties) is somewhat untenable if the creator of an AI is also the one setting the terms of any potential cooperation. If we want to further develop the idea of human-AI cooperation, it may be important to establish groups of people who are committed to not developing AI models. These groups would need to be technically proficient (so as not to be deceived by either AI developers or AI systems) and willing to serve as essentially neutral arbiters between AI systems, their creators, and the rest of humanity. MIRI and other similar orgs may in fact be in a somewhat good position to try this strategy.

Reply1
Load More
3testingthewaters's Shortform
7mo
17
No wikitag contributions to display.
-9On the Function of Faith in A Probably-Simulated Universe
9d
12
6Do model evaluations fall prey to the Good(er) Regulator Theorem?
13d
0
245I am worried about near-term non-LLM AI developments
1mo
53
2A Letter to His Highness Louis XV, the King of France
4mo
0
14The Fork in the Road
6mo
12
3testingthewaters's Shortform
7mo
17
4A concise definition of what it means to win
7mo
1
33The Monster in Our Heads
7mo
4
13Some Comments on Recent AI Safety Developments
10mo
1
2Changing the Mind of an LLM
11mo
0
Load More