hot take: introspection isn't really real. you can't access your internal state in any meaningful sense beyond what your brain chooses to present to you (e.g visual stimuli, emotions, etc), for reasons outside of your direct control. when you think you're introspecting, what's really going on when you think you're introspecting is you have a model of yourself inside your brain, which you learn gradually by seeing yourself do certain things, experience certain stimuli or emotions, etc.
your self-model is not fundamentally special compared to any other models...
People often use the term "negative reinforcement" to mean something like punishment, where a teacher or trainer inflicts pain or uncomfortable deprivation on the individual being trained. Is this the sort of thing you mean? Is there anything analogous to pain or deprivation in AI training?
There's this concept I keep coming around to around confidentiality and shooting the messenger, which I have not really been able to articulate well.
There's a lot of circumstances where I want to know a piece of information someone else knows. There's good reasons they have not to tell me, for instance if the straightforward, obvious thing for me to do with that information is obviously against their interests. And yet there's an outcome better for me and either better for them or the same for them, if they tell me and I don't use it against them.
(Consider...
(That last paragraph is a pile of sazen and jargon, I don't expect it's very clear. I wanted to write this note because I'm not trying to score points via confusion and want to point out to any readers it's very reasonable to be confused by that paragraph.)
Why is there so little discussion about the loss of status of stay at home parenting?
When my grandmother quit being a nurse to become a stay at home mother, it was seen like a great thing. She gained status over her sisters, who stayed single and in their careers.
When my mother quit her office role to become a stay at home mother, it was accepted, but not celebrated. She likely loss status in society due to her decision.
I am a mid 30s millenial, and I don't know a single woman who would leave her career to become a stay at home mother. They fear that their...
There's a difference between who plans to leave their career and who ends up leaving.
Some paths:
- childcare is more expensive than one partner earns after taxes, and it's cheaper for one parent to stay home.
- managing work / commute / child appointments (especially if they have special needs) / child sickness / childcare is so overwhelming that a parent quits their job to have fewer things to manage. Or they feel they're failing at the combination of work and parenting and must pick one.
- the family is financially secure enough they feel they can do ...
I think that I've historically underrated learning about historical events that happened in the last 30 years, compared to reading about more distant history.
For example, I recently spent time learning about the Bush presidency, and found learning about the Iraq war quite thought-provoking. I found it really easy to learn about things like the foreign policy differences among factions in the Bush admin, because e.g. I already knew the names of most of the actors and their stances are pretty intuitive/easy to understand. But I still found it interesting to ...
I have long thought that I should focus on learning history with a recency bias, since knowing about the approximate present screens off events of the past.
We get like 10-20 new users a day who write a post describing themselves as a case-study of having discovered an emergent, recursive process while talking to LLMs. The writing generally looks AI generated. The evidence usually looks like, a sort of standard "prompt LLM into roleplaying an emergently aware AI".
It'd be kinda nice if there was a canonical post specifically talking them out of their delusional state.
If anyone feels like taking a stab at that, you can look at the Rejected Section (https://www.lesswrong.com/moderation#rejected-posts) to see what sort of stuff they usually write.
I used to think reward was not going to be the optimization target. I remember hearing Paul Christiano say something like "The AGIs, they are going to crave reward. Crave it so badly," and disagreeing.
The situationally aware reward hacking results of the past half-year are making me update more towards Paul's position. Maybe reward (i.e. reinforcement) will increasingly become the optimization target, as RL on LLMs is scaled up massively. Maybe the models will crave reward.
What are the implications of this, if true?
Well, we could end up in Control Wo...
I am talking about AIs similar to current-day systems, for some notion of "similar" at least. But I'm imagining AIs that are trained on lots more RL, especially lots more long-horizon RL.
I am just properly introducing myself today to LessWrong.
Some of you might know me, especially if you're active in Open Source AI movements like EleutherAI or Mozilla's 0din bug bounty program. I've been a lurker since my teenage years but given my vocational interest in AI safety I've decided to make an account using my real name and likeness.
Nice to properly reconnect.
By EoY 2025 I'll be done my undergraduate degree, and I hope to pursue a Master's in International Relations with a focus on AI Safety, either in Fall 2026 or going forward.
Also, my timelines are rather orthodox. I don't hold by the AI 2027 projection, but rather by Ray Kurzweil's 2029 for AGI, and 2045 for a true singularity event.
I'm happy to discuss further with anyone!
Prime Day (now not just an amazon thing?) ends tomorrow, so I scanned Wirecutter's Prime Day page for plausibly-actually-life-improving purchases so you didn't have to (plus a couple others I found along the way; excludes tons of areas that I'm not familiar with, like women's clothing or parenting):
Seem especially good to me:
the core atrocity of today's social networks is that they make us temporally nearsighted. they train us to prioritize the short-term.
happiness depends on attending to things which feel good long-term—over decades. But for modern social networks to make money, it is essential that posts are short-lived—only then do we scroll excessively and see enough ads to sustain their business.
It might go w/o saying that nearsightedness is destructive. When we pay more attention to our short-lived pleasure signals—from cute pics, short clips, outrageous news, hot actors...
In past we weren't in spaces which wanted us so desperately to be, and so were designed for us to be, be single-minded consumers.
Workplaces, homes, dinners, parks, sports teams, town board meetings, doctors offices, museums, art studios, walks with friends--all of these are settings that value you for being yourself and prioritizing long term cares.
I think it's really only in spaces that want us to consume, and want us to consume cheap/oft-expiring things, that we're valued for consumerist behavior/short term thinking. Maybe malls want us to be like this t...
Eliezer’s form of moral realism about good (as a real but particular shared concept of value which is not universally compelling to minds) seems to imply that most of us prefer to be at least a little bit evil, and can’t necessarily be persuaded otherwise through reason.
Seems right.
And Nietzsche would probably argue the two impulses towards good and evil aren't really opposites anyway.
I strongly agree with you, conditional on AI alignment with humanity's growth.
If you believe that AGI is possible, would be radically transformative, why cap things? We can have happy immortal minds whether an AI or human (transhumans, really).
Perhaps you're concerned with the way current AIs (LLMs like ChatGPT & Claude) are thought of as tools. This worries me as well, when I am not worrying over the possibility of an apocalypse.
Sometimes people think of "software-only singularity" as an important category of ways AI could go. A software-only singularity can roughly be defined as when you get increasing-returns growth (hyper-exponential) just via the mechanism of AIs increasing the labor input to AI capabilities software[1] R&D (i.e., keeping fixed the compute input to AI capabilities).
While the software-only singularity dynamic is an important part of my model, I often find it useful to more directly consider the outcome that software-only singularity might cause: the feasibi...
I am confused about why this post on the ethics of eating honey is so heavily downvoted.
It sparked a bunch of interesting discussion in the comments (e.g. this comment by Habryka and the resulting arguments on how to weight non-human animal experiences)
It resulted in at least one interesting top-level rebuttal post.
I assume it led indirectly to this interesting short post also about how to weight non-human experiences. (this might not have been downstream of the honey post but it's a weird coincidence if isn't)
I think the original post certainly had flaws,...
I feel like you haven't read much BB.
That is true. I have not, nor do I intend to.
These political asides are of a piece with the philosophical jabs and brags he makes in his philosophical essays.
That doesn't actually rebut my observation, unless you are claiming to have seen jibes and sneering as dumb and cliche as those in his writings from before ChatGPT (Nov 2022).
All the succeeding paths to superintelligence seem causally downstream of Moore's law:
Succeeding path to cheap energy also follows same:
I mean R&D of packing more transistors on a chip, and the casually downstream stuff such as R&D of miniaturisation of detectors, transducers, diodes, amplifiers etc
For anyone who doubts deep state power:
(1) When Elon's Doge tried to investigate the Pentagon. A bit after that there's the announcement that Elon will soon leave Doge and there's no real Doge report about cuts to the Pentagon.
(2) Pete Hegseth was talking about 8% cuts to the military budget per year. Instead of a cut, the budget increased by 13%.
(3) Kash Patel and Pam Bondi switch on releasing Epstein files and their claim that Epstein never blackmailed anyone is remarkable.
I guess like, a larger organization with some more long term goals?
The Pentagon is a larger organization which does have long-term goals around increasing it's budget and preventing it's budget from being reduced. It also has long-term goals around keeping certain parts of what it does secret that are threatened by DOGE sniffing around.
I think that's just sloppiness, though.
So if I could prove that this is not just sloppy but intentional to reduce information being revealed to congressional inquiries and Freedom of Information Act requests, tha...
Are there known "rational paradoxes", akin to logical paradoxes ? A basic example is the following :
In the optimal search problem, the cost of search at position i is C_i, and the a priori probability of finding at i is P_i.
Optimality requires to sort search locations by non-decreasing P_i/C_i : search in priority where the likelyhood of finding divided by the cost of search is the highest.
But since sorting cost is O(n log(n)), C_i must grow faster than O(log(i)) otherwise sorting is asymptotically wastefull.
Do you know any other ?
So P_i/C_i is in [0,1], the precision is unbounded, but for some reason, a radix sort can do the job in linear time ?
There could be pathological cases where all P_i/C_i are the same up to epsilon.
I guess I'm searching for situation where doing cost c, computing c cost c', etc... Branching prediction comes to mind.
I came across this today. Pretty cool.
"If I had only one hour to save the world, I would spend fifty-five minutes defining the problem, and five minutes finding the solution." ~Einstein, maybe
I like Quote Investigator for memetic quotes like this. It begins with
...The earliest relevant evidence located by QI appeared in a 1966 collection of articles about manufacturing. An employee of the Stainless Processing Company named William H. Markle wrote a piece titled “The Manufacturing Manager’s Skills” which included a strong match for the saying under investigation. However, the words were credited to an unnamed professor at Yale University and not to Einstein. Also, the hour was split into 40 vs. 20 minutes instead of 55 vs. 5 minutes. Boldface has b
What do you think is the cause of Grok suddenly developing a liking for Hitler? I think it might be explained by him being trained on more right-wing data, which accidentally activated it in him.
Since similar things happen in open research.
For example you just need the model to be trained on insecure code, and the model can have the assumption that the insecure code feature is part of the evil persona feature, so it will generally amplify the evil persona feature, and it will start to praise Hitler at the same time, be for AI enslaving humans, etc., like i...
From a simulator perspective you could argue that Grok:
Maybe I'm reaching here but this seems plausible to me.