Hastings

Wiki Contributions

Comments

Today many of us are farther away from ground truth. The internet is an incredible means of sharing and discovering information, but it promotes or suppresses narratives based on clicks, shares, impressions, attention, ad performance, reach, drop off rates, and virality - all metrics of social feedback.  As our organizations grow larger, our careers are increasingly beholden to performance reviews, middle managers' proclivities, and our capacity to navigate bureaucracy. We find ourselves increasingly calibrated by social feedback and more distant from direct validation or repudiation of our beliefs about the world.

I seek a way to get empirical feedback on this set of claims- specifically the direction-of-change-over-time assertions "farther... increasingly... more distant..."

Yeah, in the lightcone scenario evolution probably never actually aligns the inner optimizers- although it may align them, as a super intelligence copying itself will have little leeway for any of those copies having slightly more drive to copy themselves than their parents. Depends on how well it can fight robot cancer.

However, while a cancer free paperclipper wouldn't achieve "AGIs take over the lightcone and fill it with copies of themselves, to at least 90% of the degree to which they would do so if their terminal goal was filling it with copies of themselves," they would achieve something like "AGIs take over the lightcone and briefly fill it with copies of themselves, to at least 10^-3% of the degree to which they would do so if their terminal goal was filling it with copies of themselves" which is in my opinion really close. As a comparison, if Alice sets off Kmart AIXI with the goal of creating utopia we don't expect the outcome "AGIs take over the lightcone and convert 10^-3% of it to temporary utopias before paperclipping."

Also, unless you beat entropy, for almost any optimization target you can trade "fraction of the universe's age during which your goal is maximized" against "fraction of the universe in which your goal is optimized" since it won't last forever regardless. If you can beat entropy, then the paperclipper will copy itself exponentially forever.


 

Hastings4-19

Evolution is threatening to completely recover from a worst case inner alignment failure. We are immensely powerful mesaoptimizers. We are currently wildly misaligned from optimizing for our personal reproductive fitness. Yet, this state of affairs feels fragile! The prototypical lesswrong AI apocalypse involves robots getting into space and spreading at the speed of light extinguishing all sapient value, which from the point of view of evolution is basically a win condition.

In this sense, "reproductive fitness" is a stable optimization target. If there are more stable optimizations targets (big if), finding one that we like even a little bit better than "reproductive fitness" could be a way to do alignment.

Basically, the claims in the linked post that LLM inference is compute bound, and that a modern nvidia chip inferring LLaMa only achieves 30% utilization, seem extraordinarily unlikely to both be true.

Crypto asics fundamentally didn’t need memory bandwidth. Modern GPUs are basically memory bandwidth asics already.

Answer by Hastings40

Phenomenon: The cosmological principle
Situation where it seems false and unfalsifiable: The distant future after galaxies outside of the local group depart the cosmic event horizon
 

According to a widely held understanding of the far future (~100 Billion years), the distant galaxies will fade completely from view and the local group will likely merge into one galaxy. For civilizations that arise in this future orbiting trillion year old red dwarfs, the hypothesis that there are billions of galaxies just like the one they are in will be unfalsifiable. The evidence will point to all mass in the universe living in one lump with a reachable center.

This isn't my example, it's sort of the canonical scenario to use as a metaphor for how inflation-based-multiverse theories could be true yet undetectable. For example, see the afterword to "A Universe from Nothing" https://www.google.com/books/edition/A_Universe_from_Nothing/TGpbASdsIW4C?hl=en&gbpv=1&dq=A%20universe%20from%20nothing%20dawkins&pg=PA187&printsec=frontcover

You commented yourself that the word "woke" is ill defined, but I don't think this post takes that ill-definition seriously enough. I don't really know what you mean by it, and frankly I'd be surprised if two readers (both within the LessWrong overton window but with significant political differences), who were both confident that they understood what you meant, had the same understanding.

I've laid out a concrete example of this at https://www.lesswrong.com/posts/FgXjuS4R9sRxbzE5w/medical-image-registration-the-obscure-field-where-deep , following the "optimization on a scaffold level" route. I found a real example of a misaligned inner objective outside of RL, which is cool

No one we have worked with has had a license. I think you need one to take care of multiple people's kids at your house, but not to take care of one family's kids at their house.

Answer by Hastings30

If you can get to Seattle for your partner's career, you can likely get a job nannying during the day, which will pay $25 to $30 an hour and doesn't require a car. 

This time last summer I was an incoming intern in Seattle and I was unable to pay less than $30 an hour for childcare during working hours, hiring by combing through Facebook groups for nannys and sending many messages. At this price, one of the nannys we worked with had a car and the other did not. I do not know what the childcare market is like near your current location.

Load More