avturchin

Wikitag Contributions

Comments

Sorted by

[Deleted]

[This comment is no longer endorsed by its author]Reply

If we know the correct answers to decision theory problems, we have some internal instrument: either a theory or a vibe meter, to learn the correct answers. 

Claude seems to learn to mimic our internal vibe meter. 

The problem is that it will not work outside the distribution. 

Yes, great variant of the universal answer-improving prompt and it can be applied several times to any content. 

If the simulation argument is valid and dreams are simulations of reality, can we apply the simulation argument to dreams? If not, is this an argument against the simulation argument? If yes, why am I not now in a dream?

If I see something, is it more likely to be dream or reality?
Sleeping takes only one-third of my time, and REM takes even less.
But:

  • Some dreams occur even in other phases of sleep
  • Dreams are much more eventful than normal life. There is always something happening. Also, the distribution of events in dreams is skewed toward expensive, dangerous, adventurous content, full of social interactions.
  • There is an eraser of dream memory, which cleans memories of dreams after every 15 minutes and also after awakening and during the day. As a result, we underestimate the number of dreams we have had.

As a result, the number of important events in dreams may be several orders of magnitude more than in real life. I think a good estimate is 100 times, but it depends on the types of events. For recurrent dreams - like big waves and war for me - it can be much higher.

So why am I not in a dream now? Because writing coherent dream-conscious (lucid) text is not the dominant type of content in dreams. But if I were chased by a monster or big waves, I should give higher a priori chances that I am actually dreaming.

Conclusion: The simulation argument works for dreams, but selectively, as dream content is different from most normal life content.

Yes, but it knows all Bostrom articles, maybe because it has seen the list a hundred times. 

Most LLMs' replies can be improved by repeatedly asking "Improve the answer above" and it is similar to the test-time compute idea and diffusion. 

In most cases, I can get better answers from LLMs just by asking "Improve the answer above."

In my experience, the improvements are observable for around 5 cycles, but after that the result either stops improving or gets stuck in some error mode and can't jump to a new level of thinking. My typical test subject: "draw a world map as text art." In good improvement sessions with Sonnet, it eventually adds grids and correct positions for continents.

One person on Twitter (I lost the link, maybe @goodside) automated this process and got much better code for a game after 100 cycles of improvements during an entire night using many credits. He asked Claude to write code for automated prompting first. I repeated this experiment with my tasks.

I tried different variants of "improve it," like adding critiques or generating several answers within one reply. I also tried a meta-level approach, where I asked to improve not only the answer but also the prompt for improvements.

I started these experiments before the test-time compute idea went mainstream, and it looks like a type of test-time compute use. The process also resembles diffusion.

The main question here: in which cases does the process quickly get stuck, and in which does it produce unbounded improvements? It seems to get stuck in local minima and in situations where the model's intelligence isn't sufficient to see ways to improve or discern better or worse versions. It also can't jump to another valley: if it started improving in some direction, it will continue to push in that direction, ignoring other possibilities. Only running another chat window manually helps to change valleys.

Iterative improvement of images also works in GPT-4o. But not for Gemini Pro 2.5, and o1 is also bad at improving, progressing very slowly. It seems that test-time improving contradicts test-time reasoning.

Results for "Improve it": https://poe.com/s/aqk8BuIoaRZ7eDqgKAN6 

Variants of the main prompt: "Criticize the result above and iteratively improve it" https://poe.com/s/A2yFioj6e6IFHz68hdDx 

This prompt - "Create a prompt X for iterative improvement of the answer above. Apply the generated prompt X." - converges quickly to extraordinary results but overshoots, like creating games instead of drawings. It also uses thinking: https://poe.com/s/cLoB7gyGXHNtwj0yQfPf 

The trick is that the improving prompt should be content-independent and mechanically copy-pasted after each reply.

It looks like (based on the article published a few days ago by Anthropic about the microscope) Claude Sonnet was trained to distinguish facts from hallucinations, so it's not surprising that it knows when it hallucinates.  

My thought was different that. That even if simulation is possible, it needs original for verification. 

Also, one way to run simulations is 'physical simulations' like in Trumen Show or Alien Zoo: a real planet with real human beings which live their lives but the sky is not real at some distance and there are thousands such planets. 

Yes, to create simulations AI needs some real humans to calibrate these simulations. And it needs simulations to predict behaviour of other possible AIs which it can meet in space and their progenitor civilizations.

If AI successfully calibrates simulations, it will not need humans, or if it collect all needed data from simulations, it will turn them off.  

Also, obviously, surviving in simulations is still disempowerment of humans, can cause suffering at large scale and death of most people. 

Value-handshake is more promising way to ensure AI safety of this type. 

Load More