Consider two claims:
* Any system can be modeled as maximizing some utility function, therefore
utility maximization is not a very useful model
* Corrigibility is possible, but utility maximization is incompatible with
corrigibility, therefore we need some non-utility-maximizer kind of agent to
achieve corrigibility
These two claims should probably not both be true! If any system can be modeled
as maximizing a utility function, and it is possible to build a corrigible
system, then naively the corrigible system can be modeled as maximizing a
utility function.
I expect that many peoples' intuitive mental models around utility maximization
boil down to "boo utility maximizer models", and they would therefore
intuitively expect both the above claims to be true at first glance. But on
examination, the probable-incompatibility is fairly obvious, so the two claims
might make a useful test to notice when one is relying on yay/boo reasoning
about utilities in an incoherent way.
3
3kuira10h
Sometimes I have an internal desire different to do something different than
what I think should be done (for example, I might desire to play a game while
also thinking the better choice is to read). I've been experimenting with using
randomness to mediate this. I keep a D20 with me, give each side of the dispute
some odds proportional to the strength of its resolve, and then roll the die.
In theory, this means neither side will overpower the other, and even a small
resolve still has a chance. I'm not sure how useful this is, but it's fun, and
can sort of give me motivation (I've tried to internalize this kind of roll as a
rule not to break without good reason).
Also, when I'm merely deciding between some options, sometimes I'll roll more
casually with equal odds, and it'll help me realize that I already know which it
is I really wanted to do (if I don't like the roll's outcome).
2NicholasKross7h
In response to / inspired by this SSC post
[https://astralcodexten.substack.com/p/your-incentives-are-not-the-same]:
I was originally going to comment something about "how do I balance this with
the need to filter for niche nerds who are like me?", but then I remembered that
the post is actually literally about dunks/insults on Twitter. o_0
This, in meta- and object-level ways, got to a core problem I have: I want to do
smart and nice things with smart and nice people, yet these (especially the
social stuff) requires me to be so careful + actually have anything like a
self-filter. And even trying to practice/exercise that basic self-filtering
skill feels physically draining. (ADHD + poor sleep btw, but just pointing these
out doesn't do much!)
To expand on this (my initial comment
[https://astralcodexten.substack.com/p/your-incentives-are-not-the-same/comment/17376134]):
While I love being chill and being around chill people, I also (depending on my
emotional state) can find it exhausting to do basic social things like "not
saying every thought that you think" and "not framing every sentence I say as a
joke".
I was once given the "personal social boundaries" talk by some family members.
One of them said they were uncomfortable with a certain
behavior/conversational-thing I did. (It was probably something between "fully
conscious" and "a diagnosable tic".). And I told them flat-out that I would have
trouble staying in their boundary (which was extremely basic and reasonable of
them to set, mind you!), and that I literally preferred
not-interacting-with-them to spending the energy to mask.
Posts like this remind me of how scared of myself I sometimes am, and maybe
should be? I'm scared and of being either [ostracized by communities I deeply
love] or [exhausting myself by "masking" all the time]. And I don't really know
how to escape this, except by learned coping mechanisms that are either (to me)
"slowly revealing more of myself and being more casual, in proport
2Douglas_Knight13h
Someone just told me that the solution to conflicting experiments is more
experiments. Taken literally this is wrong: more experiments just means more
conflict. What we need are fewer experiments. We need to get rid of the bad
experiments.
Why expect that future experiments will be better? Maybe if the experimenters
read the past experiments, they could learn from them. Well, maybe, but maybe if
you read the experiments today, you could figure out which ones are bad today.
If you don't read the experiments today and don't bother to judge which ones are
better, what incentive is there for future experimenters to make better
experiments, rather than accumulating conflict?
1
1Dalcy Bremin2h
What's a good technical introduction to Decision Theory and Game Theory for
alignment researchers? I'm guessing standard undergrad textbooks don't include,
say, content about logical decision theory. I've mostly been reading posts on LW
but as with most stuff here they feel more like self-contained blog posts
(rather than textbooks that build on top of a common context) so I was wondering
if there was anything like a canonical resource providing a unified technical /
math-y perspective on the whole subject.
A THOUSAND NARRATIVES. THEORY OF MEMETIC EVOLUTION. PART 1/20. INTRO
The ultimate goal of this line of research is to gain a better understanding of
how human value system operates. The problem I see regarding current approaches
to studying values is that we cannot study {values/desires/preferences} in
isolation from the rest of cognitive mechanisms, cause according to latest
theories values are just a part of a broader system governing behaviour in
general. With that you have to have a decent model of human behaviour first to
then be able to explain value dynamics.
To get a good theory of the mind you have to meet multiple requirements:
1. A good theory of the mind must span at least four different timescales:
(genetic evolution) for the billion years in which our brains have evolved;
(memetic evolution) for the centuries of cultural accumulation of ideas
through history; (personal) for the individual development during lifetime;
and (neuronal) milliseconds during which cognitive inference happens.
2. A good theory must explain behaviour of the system on each of Marr’s three
levels of analysis[1]: (1) the computational problem the system is solving;
(2) the algorithm the system uses to solve that problem; and (3) how that
algorithm is implemented in the “physical hardware” of the system. And, the
part I think Marr is missing, the third level also has to include
explanation of how the learning environment affects agent
[https://www.lesswrong.com/posts/RCbofC8fCJ6NnYti7/intro-to-ontogenetic-curriculum].
3. A good theory must at least make an attempt at answering the main questions:
how is the generality of intelligence achieved?; what is the neural
substrate of memory?; etc.
To meet these requirements I’ve combined insights from several fields:
Developmental Psychology, Neuroscience, Ethology and Computation models of mind.
The result is the Narrative Theory. The research is still far from completion
but there ar
ACCURATELY ASSESSING SEX-RELATED CHARACTERISTICS SAVES LIVES. CAN WE MAKE IT
FAIR TO ALL HUMANS, WOMEN, MEN, TRANS AND INTER FOLKS? A NERDY IDEA.
Sex-related characteristics are medically relevant; accurately assessing them
saves lives.
But neither assigned sex nor gender identity alone properly capture them. Is
anyone else interested in designing a characteristic string instead, so all
humans, esp. all women and gender diverse folks, get proper medical care?
This idea started yesterday, when I had severe abdominal pain, and started
googling.
Eventually, I reached sites that listed various potential conditions. Some occur
in all people (e.g., stomach ulcers), albeit often not with the same
presentation and frequency; others have very specific sex-based requirements
(e.g. overian cyst, or testicular torsion).
Some webpages introduced ovary-related things as “In women, it can also be…”
Well, I thought - I highly doubt my trans girlfriend has an ovarian cyst. But we
are used to getting medical advice that does not fit for her, aren't we? (In
retrospect, why did I think that was okay, just because it was so common?)
Other sites, apparently wanting to prevent this, stated “we use female in this
text to refer to people assigned female at birth”. I was happy that they had
thought about this and cared, but… frankly, that does not work either. I was
assigned female at birth; that means I was born, and a doctor visually inspected
me, and declared “female”. And yet I most certainly do not have a fallopian tube
pregnancy now, because I had my tubes surgerically removed, which also
sterilised me. I’m as likely as the dude next door to have a fallopian tube
pregnancy now. An inter person assigned female at birth may also be dead certain
they do not have an ectopian pregnancy, because their visual inspection at birth
actually misjudged their genes and organs quite a bit.
I wondered what I would have liked the website writers to use instead. And the
more I thought about it, I th
Does anyone here know of (or would be willing to offer) funding for creating
experimental visualization tools?
I’ve been working on a program which I think has a lot of potential, but it’s
the sort of thing where I expect it to be most powerful in the context of
“accidental” discoveries made while playing with it (see e.g. early use of the
microscope, etc.).
1
2Prometheus3d
The following is a conversation between myself in 2022, and a newer version of
me earlier this year.
On the Nature of Intelligence and its "True Name":
2022 Me: This has become less obvious to me as I’ve tried to gain a better
understanding of what general intelligence is. Until recently, I always made the
assumption that intelligence and agency were the same thing. But General
Intelligence, or G, might not be agentic. Agents that behave like RLs may only
be narrow forms of intelligence, without generalizability. G might be something
closer to a simulator. From my very naive perception of neuroscience, it could
be that we (our intelligence) is not agentic, but just simulates agents. In this
situation, the prefrontal cortex not only runs simulations to predict its next
sensory input, but might also run simulations to predict inputs from other parts
of the brain. In this scenario, “desire” or “goals”, might be simulations to
better predict narrowly-intelligent agentic optimizers. Though the simulator
might be myopic, I think this prediction model allows for non-myopic behavior,
in a similar way GPT has non-myopic behavior, despite only trying to predict the
next token (it has an understanding of where a future word “should” be within
the context of a sentence, paragraph, or story). I think this model of G allows
for the appearance of intelligent goal-seeking behavior, long-term planning, and
self-awareness. I have yet to find another model for G that allows for all
three. The True Name of G might be Algorithm Optimized To Reduce Predictive
Loss.
2023 Me: interesting, me’22, but let me ask you something: you seem to think
this majestic ‘G’ is something humans have, but other species do not, and then
name the True Name of ‘G’ to be Algorithm Optimized To Reduce Predictive Loss.
Do you *really* think other animals don’t do this? How long is a cat going to
survive if it can’t predict where it’s going to land? Or where the mouse’s path
trajectory is heading? Did you th
2devansh3d
(I promised I'd publish this last night no matter what state it was in, and then
didn't get very far before the deadline. I will go back and edit and improve it
later.)
I feel like I keep, over and over, hearing a complaint from people who get most
of their information about college admissions from WhatsApp groups or their
parents’ friends or a certain extraordinarily pervasive subreddit (you all know
what I’m talking about). Something like “College admissions is ridiculous! Look
at this person, who was top of his math class and took 10 AP classes and started
lots of clubs, he didn’t get into a single Ivy, he’s going to UCLA!” I think the
closest allegory I can find for this is something like “look at this guy, he’s 7
feet tall, didn’t even make it to the NBA!” There’s something important that
they’re both missing, some fundamental confusion of a tiny part of the overall
metric from reality.
2riceissa3d
I used to have a model of breathing that went something like this: when
breathing in, the lungs somehow get bigger, creating lower air pressure inside
the lungs causing air to flow in. Then when breathing out the lungs get smaller,
creating higher air pressure inside the lungs and causing air to flow out. How
do the lungs get bigger and smaller? Eventually I learned that there's a muscle
called the diaphragm that is attached to the bottom of the lungs (??) that pulls
or pushes the lungs. If I keep my nose plugged but my mouth open, the air will
travel through my mouth. If I keep my mouth closed but my nose open, the air
will travel through my nostrils. So far, so good.
Then a few days ago, I noticed that if I keep both my nose and mouth open, I
could choose to breathe in solely through one or the other. This... doesn't make
sense, according to the model. The model would predict that the air just flows
through both pathways, maybe preferentially going through the mouth since that
seems like the larger pathway.
So something is clearly wrong with how I think about breathing. Is there some
sort of further switch inside that blocks one of the pathways? Does the nose or
the mouth contain variable-size cavities that can control air pressure to direct
the flow? I still have no idea. I'm eventually going to look it up, but I might
think about this for a little bit longer (or maybe someone here will tell me).
I thought this was a pretty interesting example of how the explanations you hear
about seemingly-basic things are easy to accept but don't make sense on further
reflection. But it's hard to notice the flaw too. In my case, after a recent ENT
visit where I was told my nasal passages are inflamed, I've been putting more
effort into consciously breathing through my nose. Then one day I woke up and as
soon as I woke up I did something like consciously breathe through my nose with
mouth closed, and then somehow I opened my mouth but then still tried to breathe
through my n
This got deleted from 'The Dictatorship Problem
[https://www.lesswrong.com/posts/pFaLqTHqBtAYfzAgx/the-dictatorship-problem]',
which is catastrophically anxietybrained, so here's the comment:
This is based in anxiety, not logic or facts. It's an extraordinarily weak
argument.
There's no evidence presented here which suggests rich Western countries are
backsliding. Even the examples in Germany don't have anything worse than the US
GOP produced ca. 2010. (And Germany is, due to their heavy censorship, worse at
resisting fascist ideology than anyone with free speech, because you can't
actually have those arguments in public.) If you want to present this case, take
all those statistics and do economic breakdowns, e.g. by deciles of per-capita
GDP. I expect you'll find that, for example, the Freedom House numbers show a
substantial drop in 'Free' in the 40%-70% range and essentially no drop in
80%-100%.
Of the seven points given for the US, all are a mix of maximally-anxious
interpretation and facts presented misleadingly. These are all arguments where
the bottom line ("Be Afraid") has been written first; none of this is reasonable
unbiased inference.
The case that mild fascism could be pretty bad is basically valid, I guess, but
without the actual reason to believe that's likely, it's irrelevant, so it's
mostly just misleading to dwell on it.
Going back to the US points, because this is where the underlying anxiety prior
is most visible:
Interpretation, not fact. We're still in early enough stages that the reality of
Biden is being compared to an idealized version of Trump - the race isn't in
full swing yet and won't be for a while. Check back in October when we see how
the primary is shaping up and people are starting to pay attention.
This has been true for a while. Also, in assessing the consequences, it's
assuming that Trump will win, which is correlated but far from guaranteed.
Premise is a fact, conclusion is interpretation, and not at all a reliable one.
1
9mako yass4d
There's something very creepy to me about the part of research consent forms
where it says "my participation was entirely voluntary."
1. Do they really think an involuntary participant wouldn't sign that? If they
understand that they would, what purpose could this possibly serve, other
than, as is commonly the purpose of contracts; absolving themselves of blame
and moving blame to the participant? Which would be downright monstrous.
Probably they just aren't fucking consequentialists, but this is all they
end up doing.
2. This is a minor thing, but it adds an additional creepy garnish: Nothing is
100% voluntary, because everything is a function of the involuntary base
reality that other people command force and resources and we want to use
them for things so we have to go along with what other people want to some
extent. I'm at peace with this, and I would prefer not to have to keep
denying it, and it feels like I'm being asked to participate in the addling
of moral philosophy.
5
3Johannes C. Mayer4d
I have a heuristic to evaluate topics to potentially write about where I
especially look for topics to write about that usually people are averse to
writing about. It seems that topics that score high according to this heuristic
might be good to write about as they can yield content with high utility
compared to what is available, simply because other content of this kind (and
especially good content of this kind) is rare.
Somebody told me that they read some of my writing and liked it. They said that
they liked how honest it was. Perhaps writing about topics that are selected
with this heuristic tends to invoke that feeling of honesty. Maybe just by being
about something that people normally don't like to be honest about, or talk
about at all. That might at least be part of the reason.
2lc4d
"No need to invoke slippery slope fallacies, here. Let's just consider the
Czechoslovakian question in of itself" - Adolf Hitler
1James Spencer4d
WILL INTERNATIONAL AI ALIGNMENT COOPERATION TRUMP THE RIGHTS OF WEAKER
COUNTRIES?
TLDR - REAL COOPERATION ON INTERNATIONAL AI REGULATION MAY ONLY BE POSSIBLE
THROUGH A MUCH MORE PEACEFUL BUT UNSENTIMENTAL FOREIGN POLICY
In 1987 President Reagan said to the United Nations "how quickly our differences
worldwide would vanish if we were facing an alien threat from outside this
world." Isn't an unaligned Artificial General Intelligence that alien threat?
And it's easy - and perhaps overly obvious and comforting - to say that humanity
would unite, but now we have this threat what would that unity look like?
Here's one not necessarily comforting thought, the weak (nations) will get
trampled further by the strong (nations). If cooperation rather than
competition among power is vital then wouldn't we need to prioritise keeping
powerful and potentially powerful countries - at least in AI terms - over other
ideological concerns. To see what this looks like let's look at some of those
powerful countries:
* China - the obvious one, would we need to annoy the national security hawks
over Taiwan, but also decent, humane liberals over Tibet and Sichuan?
* Russia - Ukraine would annoy just about everybody
* Israel - Well this happens already because of domestic considerations, but it
might reverse domestic political calculations on:
* UK - the British are a big player in AI (and seemingly more important than
the EU) so would needling them about Northern Ireland really be worth ticking
off the one reliable ally the US has with clout?
This is before looking at the role of countries that may be important in
relation to AI and who the US wouldn't want going rogue on regulation but who
neighbour China - such as Japan, South Korea and the chip superpower Taiwan.
Epistemic activism
I think LW needs better language to talk about efforts to "change minds." Ideas
like asymmetric weapons and the Dark Arts are useful but insufficient.
In particular, I think there is a common scenario where:
* You have an underlying commitment to open-minded updating and possess
evidence or analysis that would update community beliefs in a particular
direction.
* You also perceive a coordination problem that inhibits this updating process
for a reason that the mission or values of the group do not endorse.
* Perhaps the outcome of the update would be a decline in power and status
for high-status people. Perhaps updates in general can feel personally or
professionally threatening to some people in the debate. Perhaps there's
enough uncertainty in what the overall community believes that an
information cascade has taken place. Perhaps the epistemic heuristics used
by the community aren't compatible with the form of your evidence or
analysis.
* Solving this coordination problem to permit open-minded updating is difficult
due to lack of understanding or resources, or by sabotage attempts.
When solving the coordination problem would predictably lead to updating, then
you are engaged in what I believe is an epistemically healthy effort to change
minds. Let's call it epistemic activism for now.
Here are some community touchstones I regard as forms of epistemic activism:
* The founding of LessWrong and Effective Altruism
* The one-sentence declaration on AI risks
* The popularizing of terms like Dark Arts, asymmetric weapons, questionable
research practices, and "importance hacking."
* Founding AI safety research organizations and PhD programs to create a
population of credible and credentialed AI safety experts; calls for AI
safety researchers to publish in traditional academic journals so that their
research can't be dismissed for not being subject to institutionalized peer
review
1
2Dalcy Bremin5d
Why haven't mosquitos evolved to be less itchy? Is there just not enough
selection pressure posed by humans yet? (yes probably) Or are they evolving
towards that direction? (they of course already evolved towards being less itchy
while biting, but not enough to make that lack-of-itch permanent)
this is a request for help i've been trying and failing to catch this one for
god knows how long plz halp
tbh would be somewhat content coexisting with them (at the level of houseflies)
as long as they evolved the itch and high-pitch noise away, modulo disease risk
considerations.
4
1O O6d
A realistic takeover angle would be hacking into robots once we have them. We
probably don’t want any way for robots to get over the air updates but it’s
unlikely for this to be banned.
Having lived ~19 years, I can distinctly remember around 5~6 times when I
explicitly noticed myself experiencing totally new qualia with my inner
monologue going “oh wow! I didn't know this dimension of qualia was a thing.”
examples:
* hard-to-explain sense that my mind is expanding horizontally with fractal
cube-like structures (think bismuth) forming around it and my subjective
experience gliding along its surface which lasted for ~5 minutes after taking
zolpidem for the first time to sleep (2 days ago)
* getting drunk for the first time (half a year ago)
* feeling absolutely euphoric after having a cool math insight (a year ago)
* ...
Reminds me of myself around a decade ago, completely incapable of understanding
why my uncle smoked, being "huh? The smoke isn't even sweet, why would you want
to do that?" Now that I have [addiction-to-X] as a clear dimension of
qualia/experience solidified in myself, I can better model their subjective
experiences although I've never smoked myself. Reminds me of the SSC classic
[https://slatestarcodex.com/2014/03/17/what-universal-human-experiences-are-you-missing-without-realizing-it/].
Also one observation is that it feels like the rate at which I acquire these is
getting faster, probably because of increase in self-awareness + increased
option space as I reach adulthood (like being able to drink).
Anyways, I think it’s really cool, and can’t wait for more.
2
4DirectedEvolution6d
Lightly edited for stylishness
1
3Dagon6d
I give some probability space to being a Boltzmann-like simulation. It's
possible that I exist only for an instant, experience one quantum of
input/output, and then am destroyed (presumably after the extra-universal
simulators have measured something about the simulation).
This is the most minimal form of Solipsism that I have been configured to
conceive. It's also a fun variation of MWI (though not actually connected
logically) if it's the case that the simulators are running multiple parallel
copies of any given instant, with slightly different configurations and inputs.
3DirectedEvolution6d
I use ChatGPT as a starting point to investigate hypotheses to test at my
biomedical engineering job on a daily basis. I am able to independently approach
the level of understanding of specific problems of an experienced chemist with
many years of experience on certain problems, although his familiarity with our
chemical systems and education makes him faster to arrive at the same result.
This is a lived example of the phenomenon in which AI improves the performance
of the lower-tier performers more than the higher-tier performers (I am a recent
MS grad, he is a post-postdoc).
So far, I haven't been able to get ChatGPT to independently troubleshoot
effectively or propose improvements. This seems to be partly because it
struggles profoundly to grasp and hang onto the specific details I have provided
to it. It's as if our specific issue is mixed with more the more general
problems it has encountered in its training. Or as if, whereas in the real
world, strong evidence is common
[https://www.lesswrong.com/posts/JD7fwtRQ27yc8NoqS/strong-evidence-is-common],
to ChatGPT, what I tell it is only weak evidence. And if you can't update
strongly on evidence in my research world, you just can't make progress.
The way I use it instead is to validate and build confidence in my conjectures,
and as an incredibly sophisticated form of search. I can ask it how very
specific systems we use in our research, not covered in any one resource, likely
work. And I can ask it to explain how complex chemical interactions are likely
behaving in specific buffer and heat conditions. Then I can ask it how adjusting
these parameters might affect the behavior of the system. An iterated process
like this combines ChatGPT's unlimited generalist knowledge with my extremely
specific understanding of our specific system to achieve a concrete, testable
hypothesis that I can bring to work after a couple of hours. It feels like a
natural, stimulating process. But you do have to be smart enough to steer th
2JNS6d
I got my entire foundation torn down, and with it came everything else.
It all came crashing down in one giant heap of rubble.
I’ll just rebuild, I thought - not realizing you can’t build without a
foundation plan.
So all I’ve ended up doing was shift through the rubble, searching for things
that feel right.
Now I am back, in a very literal sense, to where I all began, so much was built,
so many things destroyed and corrupted, and a major piece ended and got buried.
And all I got is “what the eff am I doing here?”
The obvious answer is “yelling at the sky demanding answers” and being utterly
ignored.
I guess as per usual it is all up to me, except I don’t know how to rebuild
myself……again.
F…..
i absolutely hate bureaucracy, dumb forms, stupid websites etc. like, I almost
had a literal breakdown trying to install Minecraft recently (and eventually
failed). God.
1
3Quinn7d
"EV is measure times value" is a sufficiently load-bearing part of my worldview
that if measure and value were correlated or at least one was a function of the
other I would be very distressed.
Like in a sense, is John
[https://www.lesswrong.com/posts/voLHQgNncnjjgAPH7/utility-maximization-description-length-minimization]
threatening to second-guess hundreds of years of consensus on is-ought?
3
3Stephen Fowler7d
Are humans aligned?
Bear with me!
Of course, I do not expect there is a single person browsing Short Forms who
doesn't already have a well thought out answer to that question.
The straight forward (boring) interpretation of this question is "Are humans
acting in a way that is moral or otherwise behaving like they obey a useful
utility function." I don't think this question is particularly relevant to
alignment. (But I do enjoy whipping out my best Rust Cohle impression
[https://www.youtube.com/watch?v=Z5vwDfg3JNQ])
Sure, humans do bad stuff but almost every human manages to stumble along in a
(mostly) coherent fashion. In this loose sense we are "aligned" to some higher
level target, it just involves eating trash and reading your phone in bed.
But I don't think this is a useful kind of alignment to build off of, and I
don't think this is something we would want to replicate in an AGI.
Human "alignment" is only being observed in an incredibly narrow domain. We
notably don't have the ability to self modify and of course we are susceptible
to wire-heading. Nothing about current humans should indicate to you that we
would handle this extremely out of distribution shift well.
1
3kuira7d
it's interesting that an intelligence in the 'original'/'top-level' universe
also might [if simulation theory is valid] have evidence to assume it's
close-to-certainly simulated
maybe it would do acausal trade and precommit to not shutting down simulated
intelligences
1Omega.7d
Quick updates:
* Our next critique (on Conjecture) will be published in 10 days.
* The critqiue after that will be on Anthropic. If you'd like to be a reviewer,
or have critiques you'd like to share, please message us or email
anonymouseaomega@gmail.com [anonymouseaomega@gmail.com].
* If you'd like to help edit our posts (incl. copy-editing - basic grammar etc,
but also tone & structure suggestions and fact-checking/steel-manning),
please email us!
* We'd like to improve the pace of our publishing and think this is an area
that external perspectives could help us
* Make sure our content & tone is neutral & fair
* Save us time so we can focus more on research and data gathering
The 'new user' flag being applied to old users with low karma is condescending
as fuck.
I'm not a new user. I'm an old user who has spent most of my recent time on LW
telling people things they don't want to hear.
Well, most of the time I've actually spent posting weekly meetups, but other
than that.
4
5Garrett Baker8d
Last night I had a horrible dream: That I had posted to LessWrong a post filled
with useless & meaningless jargon without noticing what I was doing, then I went
to slee, and when I woke up I found I had <−60 karma on the post. When I read
the post myself I noticed how meaningless the jargon was, and I myself couldn't
resist giving it a strong-downvote.
5DirectedEvolution8d
Over the last six months, I've grown more comfortable writing posts that I know
will be downvoted. It's still frustrating. But I used to feel intensely anxious
when it happened, and now, it's mostly just a mild annoyance.
The more you're able to publish your independent observations, without worrying
about whether others will disagree, the better it is for community epistemics.
1
3jacquesthibs8d
AI labs should be dedicating a lot more effort into using AI for cybersecurity
as a way to prevent weights or insights from being stolen. Would be good for
safety and it seems like it could be a pretty big cash cow too.
If they have access to the best models (or specialized), it may be highly
beneficial for them to plug them in immediately to help with cybersecurity
(perhaps even including noticing suspicious activity from employees).
I don’t know much about cybersecurity so I’d be curious to hear from someone who
does.
3Quinn8d
messy, jotting down notes:
* I saw this thread https://twitter.com/alexschbrt/status/1666114027305725953
[https://twitter.com/alexschbrt/status/1666114027305725953] which my
housemate had been warning me about for years.
* failure mode can be understood as trying to aristotle the problem, lack of
experimentation
* thinking about the nanotech ASI threat model, where it solves nanotech
overnight and deploys adversarial proteins in all the bloodstreams of all the
lifeforms.
* These are sometimes justified by Drexler's inside view of boundary conditions
and physical limits.
* But to dodge the aristotle problem, there would have to be an amount of
bandwidth of what's passing between sensors and actuators (which may roughly
correspond to the number of do applications in pearl)
* Can you use something like communication complexity
https://en.wikipedia.org/wiki/Communication_complexity
[https://en.wikipedia.org/wiki/Communication_complexity] (between a system
and an environment) to think about "lower bound on the number of
sensor-actuator actions" mixed with sample complexity (statistical learning
theory)
* Like ok if you're simulating all of physics you can aristotle nanotech, for a
sufficient definition of "all" that you would run up against realizability
problems and cost way more than you actually need to spend.
Like I'm thinking if there's a kind of complexity theory of pearl (number of do
applications needed to acquire some kind of "loss"), then you could direct that
at something like "nanotech projects" to fermstimate the way AIs might tradeoff
between applying aristotlean effort (observation and induction with no
experiment) and spending sensor-actuator interactions (with the world).
There's a scenario in the sequences if I recall correctly about which physics an
AI infers from 3 frames of a video of an apple falling, and something about how
security mindset suggests you shouldn't expect your information-theoret
Eliezer recently tweeted that most people can't think, even most people here
[https://twitter.com/ESYudkowsky/status/1665165312247975937], but at least this
is a place where some of the people who can think, can also meet each other
[https://twitter.com/ESYudkowsky/status/1665439386089955330].
This inspired me to read Heidegger's 1954 book What is Called Thinking?
[https://en.wikipedia.org/wiki/What_Is_Called_Thinking%3F] (pdf
[https://www.sas.upenn.edu/~cavitch/pdf-library/Heidegger_What_Is_Called_Thinking.pdf]),
in which Heidegger also declares that despite everything, "we are still not
thinking".
Of course, their reasons are somewhat different. Eliezer presumably means that
most people can't think critically, or effectively, or something. For Heidegger,
we're not thinking because we've forgotten about Being, and true thinking starts
with Being.
Heidegger also writes, "Western logic finally becomes logistics, whose
irresistible development has meanwhile brought forth the electronic brain." So
of course I had to bring Bing into the discussion.
Bing told me what Heidegger would think of Yudkowsky
[https://pastebin.com/XccznywE], then what Yudkowsky would think of Heidegger
[https://pastebin.com/EeS9qMMg], and finally we had a more general discussion
about Heidegger and deep learning [https://pastebin.com/LPryEh0E] (warning,
contains a David Lynch spoiler). Bing introduced me to Yuk Hui
[https://en.wikipedia.org/wiki/Yuk_Hui], a contemporary Heideggerian who started
out as a computer scientist, so that was interesting.
But the most poignant moment came when I broached the idea that perhaps language
models can even produce philosophical essays, without actually thinking. Bing
defended its own sentience, and even creatively disputed the Lynchian metaphor,
arguing that its "road of thought" is not a "lost highway", just a "different
highway". (See part 17, line 254.)
6O O9d
If alignment is difficult, it is likely inductively difficult (difficult
regardless of your base intelligence), and ASI will be cautious of creating a
misaligned successor or upgrading itself in a way that risks misalignment.
You may argue it’s easier for an AI to upgrade itself, but if the process is
hardware bound or even requires radical algorithmic changes, the ASI will need
to create an aligned successor as preferences and values may not transfer
directly to new architectures or hardwares.
If alignment is easy we will likely solve it with superhuman narrow
intelligences and aligned near peak human level AGIs.
I think the first case is an argument against FOOM, unless the alignment problem
is solvable but only at higher than human level intelligences (human meaning the
intellectual prowess of the entire civilization equipped with narrow superhuman
AI). That would be a strange but possible world.
1
4Writer9d
Rational Animations has a subreddit:
https://www.reddit.com/r/RationalAnimations/
[https://www.reddit.com/r/RationalAnimations/]
I hadn't advertised it until now because I had to find someone to help moderate
it.
I want people here to be among the first to join since I expect having LessWrong
users early on would help foster a good epistemic culture.
2lc9d
The greatest generation imo deserves their name, and we should be grateful to
live on their political, military, and scientific achievements.
2O O9d
The fact that this was completely ignored is a little disappointing. This is a
very important question that would help put upper bounds to value drift, but it
seems that answering it limits the imagination when it comes to ASI. Has there
ever been an answer to it?
I have a feeling larger brains have a higher coordination problem between its
subcomponents, especially when you hit information transfer limits. This would
put some hard limits on how much you can scale intelligence but I may be wrong.
A fermi estimate on the upper bounds of intelligence may eliminate some problem
classes alignment arguments tend to include.