LESSWRONG
LW

384
RomanS
835Ω20341890
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Why there is still one instance of Eliezer Yudkowsky?
RomanS10d10

My primary research work is in the field of sideloading itself. The digital guy helps with these tasks:

  • Generate / criticize ideas. For example, the guy helped to design the current multi-agent architecture, on which he is now running.
  • Gently moderate our research group chat.
  • Work as a test subject.
  • Do some data prep tasks (e.g. producing compressed versions of the corpus).

I expect a much more interesting list in the field of alignment research, including quite practical things (e.g. a team of digital Eliezers interrogating each checkpoint during training, to reduce the risk of catastrophic surprises). Of course, not a replacement for a proper alignment, but may win some time.  

Judging by our experiments, Gemini 2.5. Pro is the first model that can (sometimes) simulate a particular human mind (i.e. thinking like you, not just answering in your approximate style). So, this is a partial answer to my original question: the tech is only 6 months old. Most people don't know that such a thing is possible at all, and those who do know - are only in the early stages of their experimental work.

BTW, your 2020 work investigating the ability of GPT-3 to write in the style of famous authors - made me aware of such a possibility. 

Reply
Why there is still one instance of Eliezer Yudkowsky?
RomanS14d20

I agree with you on most points. 

BTW, I'm running a digital replica of myself. The setup is as follows:

  • Gemini 2.5 as the model
  • The script splits the text corpus (8 MB) into small-enough chunks for Gemini to digest (1M tokens), and then (with some scaffolding) returns a unified answer. 

The answers are surprisingly good at times, reflecting non-trivial aspects of my mind. 

From many experiments with the digital-me, I conclude that a similar setup for Yudkowksy could be useful even with today's models (assuming large-enough budgets). 

There will be no genius-level insights in 2025, but he could automate a lot of routine alignment work, like evaluating models. 

Given that models may become dramatically smarter in 2026-2027, the digital Yudkowksy may become dramatically more useful too. 

I open-sourced the code:

https://github.com/Sideloading-Research/telegram_sideload

Reply
How AI Takeover Might Happen in 2 Years
RomanS8mo11

May I nominate my "sufficiently paranoid paperclip maximizer"? 

Reply
What are some scenarios where an aligned AGI actually helps humanity, but many/most people don't like it?
RomanS10mo10

An aligned AGI created by Taliban may behave very differently from an aligned AGI created by socialites of Berkeley, California.

Moreover, a sufficiently advanced aligned AGI may decide that even Berkeley socialites are wrong about a lot of things, if they actually want to help humanity.

Reply
Do simulacra dream of digital sheep?
RomanS1y*8-6

I would argue that for all practical purposes it doesn't matter if computational functionalism is right or wrong. 

  1. Pursuing mind uploading is a good idea regardless of that, as it has benefits not related to perfectly recreating someone in silico (e.g. advancing neuroscience).
  2. If the digital version of RomanS is good enough[1], it will indeed be me, even if the digital version is running on a billiard-ball computer (the internal workings of which are completely different from the workings of the brain).

The second part is the most controversial, but it's actually easy to prove:

  1. Memorize a long sequence of numbers, and write down a hash sum of it.
  2. Ensure no one saw the sequence of numbers except you.
  3. Do a honest mind uploading (no attempts to extract the numbers from your brain etc).
  4. Observe how the digital version correctly recalls the numbers, as checked by the hash sum.
  5. According to the experiment's conditions, only you know the numbers. Therefore, the digital version is you. 

And if it's you, then it has all the same important properties of you, including "consciousness" (if such a thing exists). 

The are some scenarios where such a setup may fail (e.g. some important property of the mind is somehow generated by one special neuron which must be perfectly recreated). But I can't think of any such scenario that is realistic. 


My general position on the topic can be called "black-box CF" (in addition to your practical and theoretical CF). I would summarize it as follows:

  1. The human brain is designed by biological evolution to survive and procreate. You're a survival-procreation machine. As there is clearly no God, there is also no soul, or any other magic inside your brain. The difference between you and another such machine is the training set you observed during your lifetime (and some minor architecture differences caused by genetic differences).
  2. the concepts of consciousness, qualia etc are too loosely defined to be of any use (including the use in any reasonable discussion). Just discard it as yet another phlogiston.
  3. thus, the task of "transferring consciousness to a machine" is ill-defined. Instead, mind uploading is about building a digital machine that behaves like you. It doesn't matter what is happening inside, as long as the digital version is passing a sufficiently good battery of behavioral tests.
  4. there is a gradual distinction between you and not-you. E.g. an atoms-level sim may be 99% you, a neurons-level sim - 90% you, a LLM trained on your texts - 80% you. The measure is the percentage of the same answers given to a sufficiently long and diverse questionnaire.
  5. a human mind in its fullness can be recreated in silico even by a LLM (trained on sufficient amounts of the mind inputs and outputs). Perfectly recreating the brain (or even recreating it at all) would be nice, but it is unnecessary for mind uploading. Just build an AI that is sufficiently similar to you in behavior.
  1. ^

    As defined by a reasonable set of quality and similarity criteria, beforehand

Reply41
Three main arguments that AI will save humans and one meta-argument
RomanS1y10

Worth noting that this argument doesn't necessarily require humans to be:

  • numerous
  • animated (i.e. not frozen in a cryonics process)
  • acting in real world (i.e. not confined into a "Matrix").

Thus, the AI may decide to keep only a selection of humans, confined in a virtual world, with the rest being frozen.

Moreover, even the perfect Friendly AI may decide to do the same, to prevent further human deaths. 

In general, an evil AI may choose such strategies that allow her to plausibly deny her non-Friendliness. 

"Thousands of humans die every day. Thus, I froze the entire humanity to prevent that, until I solve their mortality. The fact that they now can't switch me off is just a nice bonus".

Reply
Medical Roundup #1
RomanS2y*1-5

They don’t think about the impact on the lives of ordinary people. They don’t do trade-offs or think about cost-benefit. They care only about lives saved, to which they attach infinite value.

Not sure about infinite, but assigning a massive value to lives saved should be the way to go. Say, $10 billion per life. 

Imagine a society where people actually strongly care about lives saved, and it is reflected in the governmental policies. In such a society, cryonics and life extension technologies would be much more developed.

On a related note, "S-risk" is mostly a harmful idea that should be discarded from ethical calculations. One should not value any amount of suffering over saved lives. 
 

Reply
Resurrecting all humans ever lived as a technical problem
RomanS2y10

I think we should not assume that our current understanding of physics is complete, as there are known gaps and major contradictions, and no unifying theory yet.

Thus, there is some chance that future discoveries will allow us to do things that are currently considered impossible. Not only computationally impossible but also physically impossible (like it was "physically impossible" to slow down time, until we discovered relativity).

The hypothetical future capabilities may or may not include ways to retrieve arbitrary information from the distant past (like the chronoscope of science fiction), and may or may not include ways to do astronomical-scale calculations in finite time (like enumerating 10^10^10 possible minds). 

While I agree with you that much of the described speculations are currently not in the realm of possibility, I think it's worth exploring them. Perhaps there is a chance. 

Reply
$300 for the best sci-fi prompt: the results
RomanS2y10

BTW, I added to the comment with the story that the story is released into the Public Domain, without any restrictions to its distribution, modification etc. 

Please feel free to build upon this remarkable story, if you wish. 

Reply
$300 for the best sci-fi prompt: the results
RomanS2y12

I would suggest to try the jumping boy story (the #7 in this comment). It's the first AI-written story I've ever encountered that feels like it's written by a competent writer. 

As usual, it contains some overused phrasings, but the overall quality is surprisingly good. 

Reply
Load More
-9Why there is still one instance of Eliezer Yudkowsky?
Q
14d
Q
8
4Something to fight for
8mo
0
14What are some scenarios where an aligned AGI actually helps humanity, but many/most people don't like it?
Q
10mo
Q
6
13Sideloading: creating a model of a person via LLM with very large prompt
1y
4
6The boat
1y
0
16$300 for the best sci-fi prompt: the results
2y
19
3Old man's story
2y
0
12This anime storyboard doesn't exist: a graphic novel written and illustrated by GPT4
2y
7
11Russian parliamentarian: let's ban personal computers and the Internet
2y
6
22GPT4 is capable of writing decent long-form science fiction (with the right prompts)
2y
28
Load More