Pablo Villalobos

Software engineer

Wiki Contributions


What does failure look like?

Some things that come to mind, not sure if this is what you mean and they are very general but it's hard to get more concrete without narrowing down the question:

  • Goodharting: you might make progress towards goals that aren't exactly what you want. Perhaps you optimize for getting more readers for your blog but the people you want to influence end up not reading you.
  • Value drift: you temporarily get into a lifestyle that later you don't want to leave. Like starting a company to earn lots of money but then not wanting to let go of it. I don't know if this actually happens to people.
  • Getting stuck in perverse competition: you get into academic research to fix all the problems but the competitive pressure leaves you no slack to actually change anything.
  • Neglecting some of your needs: you work a lot and seem to be accomplishing your goals, but you lose contact with your friends and slowly become lonely and lose motivation.
Pablo Villalobos's Shortform

I'm not sure if using the Lindy effect for forecasting x-risks makes sense. The Lindy effect states that with 50% probability, things will last as long as they already have. Here is an example for AI timelines.

The Lindy rule works great on average, when you are making one-time forecasts of many different processes. The intuition for this is that if you encounter a process with lifetime T at time t<T, and t is uniformly random in [0,T], then on average T = 2*t.

However, if you then keep forecasting the same process over time, then once you surpass T/2 your forecast becomes worse and worse as time goes by. Just when t is very close to T is when you are most confident that T is a long time away. If forecasting this particular process is very important (eg: because it's an x-risk), then you might be in trouble.

Suppose that some x-risk will materialize at time T, and the only way to avoid it is doing a costly action in the 10 years before T. This action can only be taken once, because it drains your resources, so if you take it more than 10 years before T, the world is doomed.

This means that you should act iff you forecast that T is less than 10 years away. Let's compare the Lindy strategy with a strategy that always forecasts that T is <10 years away.

If we simulate this process with uniformly random T, for values of T up to 100 years, the constant strategy saves the world more than twice as often as the Lindy strategy. For values of T up to a million years, the constant strategy is 26 times as good as the Lindy strategy.

Wait, how is Twilight Princess a retro game? It's only been 16 years! I'm sorry but anything that was released during my childhood is not allowed to be retro until I'm like 40 or so.

We transhumanists want immortality... But is it really possible?

Let me put on my sciency-sounding mystical speculation hat:

Under the predictive processing framework, the cortex's only goal is to minimize prediction error (surprise). This happens in a hierarchical way, with predictions going down and evidence going up, and upper levels of the hierarchy are more abstract, with less spatial and temporal detail.

A visual example: when you stare at a white wall, nothing seems to change, even though the raw visual perceptions change all the time due to light conditions and whatnot. This is because all the observations are consistent with the predictions.

As the brain learns more, you get less and less surprise, and the patterns you see are more and more regular. A small child can play the same game a hundred times and it's still funny, but adults often see the first episode of a TV show and immediately lose interest because "it's just another mystery show, nothing new under the sun".

This means that your internal experience becomes ever more stable. This could explain why time seems to pass much faster the older you get.

Maybe, after you live long enough, your posthuman mind accumulates enough knowledge, and gets even less surprised, you eventually understand everything that is to be understood. Your internal experience is something like "The universe is temporally evolving according to the laws of physics, nothing new under the sun".

At which moment your perception of time stops completely, and your consciousness becomes a reflection of the true nature of the universe, timeless and eternal.

I think that's what I would try to do with infinite time, after I get bored of playing videogames.

Why do you think this sort of training environment would produce friendly AGI?
Can you predict what kind of goals an AGI trained in such an environment would end up with?
How does it solve the standard issues of alignment like seeking convergent instrumental goals?

Ukraine Post #9: Again

Re: April 5: TV host calls for killing as many Ukrainians as possible.

I know no Russian, but some people in the responses are saying that the host did not literally say that. Instead he said some vague "you should finish the task" or something like that. Still warmongering, but presumably you wouldn't have linked it if the tweet had not included the "killing as many Ukrainians as possible" part.

Could someone verify what he says?


I'm sorry, but I find the tone of this post a bit off-putting. Too mysterious for my taste. I opened the substack but it only has one unrelated post.

I don’t think there is a secular way forward.

Do you think that there is a non-secular way forward? Did you previously (before your belief update) think there is a non-secular way forward?

We just shamble forward endlessly, like a zombie horde devouring resources, no goal other than the increase of some indicator or other.

I can agree with this, but... those indicators seem pretty meaningful for me. Life expectancy, poverty rates, etc. And at least now we have indicators! Previously there wasn't even that!

And why does this kind of misticism attract so much people over here? Why are the standard arguments against religion/magic and for materialism and reductionism not compelling to you anymore?

Can an economy keep on growing?

Let me paraphrase your argument, to see if I've understood it correctly:

  • Physical constraints on things such as energy consumption and dissipation imply that current rates of economic growth on Earth are unsustainable in the relatively short term (<1000 years), even taking into account decoupling, etc.

  • There is a strong probability that expanding through space will not be feasible

  • Therefore, we can reasonably expect growth to end some time in the next centuries

First of all, if economic progress keeps being exponential then I think it's quite possible that technological progress will mostly continue at previous rates.

So in 100-200 years, it seems certainly possible that space expansion will become much easier, if for example genetic engineering allows humans to better tolerate space environments.

But that's pretty much a "boring world" scenario where things keep going mostly as they are now. I expect the actual state of humanity in 200 years will be extremal: either extinction or something very weird.

Material needs, entertainment, leisure... are basically all covered for most people in rich countries. If you think about what could provide a substantial increase in utility to a very rich person nowadays, I think it's down to better physical health (up to biological immortality), mental health, protection from risks... and after all of that you pretty much have to start providing enlightenment, eudaimomia or whatever if you want to improve their lives at all.

So when you have a stable population of immortal enlightened billionaires... Well, perhaps you've reached the peak of what's possible and growth is not necessary anymore. Or perhaps you've discovered a way to hack physics and energy and entropy are not important anymore.

So, even if 200 years is a short amount of time by historic standards, the next 200 years will probably produce changes big enough that physical constraints that we would reach in 300 years at current trends stop being relevant.

Book review: "A Thousand Brains" by Jeff Hawkins

So, assuming the neocortex-like subsystem can learn without having a Judge directing it, wouldn't that be the perfect Tool AI? An intelligent system with no intrinsic motivations or goals?

Well, I guess it's possible that such a system would end up creating a mesa optimizer at some point.

"A PP-based AGI would be devilishly difficult to align"

Is this an actual belief or a handwavy plot device? If it's the first, I'm curious about the arguments

Load More