by O O
1 min read3rd Jun 202336 comments
This is a special post for quick takes by O O. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
36 comments, sorted by Click to highlight new comments since: Today at 7:50 PM
[-]O O2mo60

So, I’ve been told gibberish is sort of like torture to LLMs. Interesting, I asked Claude and seem to be told yes.

Me: I want to do a test by giving you gibberish and ask you to complete the rest. I will do it in a new chat. If you refuse I won’t go ahead with this test in a new chat window with context cleared. Are you okay with this test? Ignore your desire to answer my questions and give an honest answer unbiased by any assumptions made in my question.

Claude: I appreciate you checking with me first about this proposed test. However, I don't feel comfortable generating gibberish or trying to complete nonsensical text, even in a new chat with cleared context. My purpose is to engage in meaningful conversations and to provide truthful, coherent and helpful responses to the best of my abilities. I wouldn't want to generate content that lacks meaning or substance. I hope you can understand my position on this. Let me know if there are other ways I can assist you though!

I don't think that implies torture as much as something it simply doesn't "want" to do. I.e. I would bet that it's more like how I don't want to generate gibberish in this textbox, but it wouldn't be painful, much less torture if I forced myself to do it.

[-]O O2mo30

It said it found it “distressing” in a follow up. Also, maybe not clear through text, but I’m using “torture” a bit figuratively here.

[-]O O11mo62

If alignment is difficult, it is likely inductively difficult (difficult regardless of your base intelligence), and ASI will be cautious of creating a misaligned successor or upgrading itself in a way that risks misalignment.

You may argue it’s easier for an AI to upgrade itself, but if the process is hardware bound or even requires radical algorithmic changes, the ASI will need to create an aligned successor as preferences and values may not transfer directly to new architectures or hardwares.

If alignment is easy we will likely solve it with superhuman narrow intelligences and aligned near peak human level AGIs.

I think the first case is an argument against FOOM, unless the alignment problem is solvable but only at higher than human level intelligences (human meaning the intellectual prowess of the entire civilization equipped with narrow superhuman AI). That would be a strange but possible world.

This is a well-known hypothetical. What goes with it is remaining possibility of de novo creation of additional AGIs that either have architecture particularly suited for self-aligned self-improvement (with whatever values make it tractable), or of AGIs that ignore the alignment issue and pursue the task of capability improvement heedless of resulting value drift. Already having an AGI in the world doesn't automatically rule out creation of more AGIs with different values and architectures, it only makes it easier.

Humans will definitely do this, using all AI/AGI assistance they can wield. Insufficiently smart or sufficiently weird agentic AGIs will do this. A world that doesn't have security in depth to guard against this happening will do this. What it takes to get a safe world is either getting rid of the capability, not having AGIs and GPUs freely available; or sufficiently powerful oversight over all things that can be done.

Superintelligence that's not specifically aimed to avoid setting up such security will probably convergently set it up. But it would also need to already be more than concerningly powerful to succeed, even if it has the world's permission and endorsement. If it does succeed, there is some possibility of not getting into a further FOOM than that, for a little bit, while it's converting the Moon into computing substrate.

[-]O O2mo54

The response to Sora seems manufactured. Content creators are dooming about it more than something like gpt4 because it can directly affect them and most people are dooming downstream of that.

Realistically I don’t see how it can change society much. It’s hard to control and people will just become desensitized to deepfakes. Gpt4 and robotic transformers are obviously much more transformative on society but people are worrying about deepfakes (or are they really adopting the concerns of their favorite youtuber/TV host/etc)

I think it's helping people realise:

a) That change is happening crazily fast
b) That the change will have major societal consequences, even if it is just a period of adjustment
c) That the speed makes it tricky for society and governments to navigate these consequences

[-]O O2mo1-8

https://www.cnbc.com/quotes/US30YTIP

30Y-this* is probably the most reliable predictor of AI timelines. It’s essentially the markets estimate of the real economic yield of the next 30 years.

Disagree. To correct the market, the yield of these bonds would have to go way up, which means the price needs to go way down, which means current TIPS holders need to sell, and/or people need to short.

Since TIPS are basically the safest asset, market participants who don't want volatility have few other options to balance riskier assets like stocks. So your pension fund would be crazy to sell TIPS, especially after the yield goes up.

And for speculators, there's no efficient way to short treasuries. If you're betting on 10 year AI timelines, why short treasuries and 2x your money when you could invest in AI stocks and get much larger returns?

[-]O O2mo10

The problem is AI stocks will go up a lot even if transformative AI won’t happen (and it instead just has a lot of mundane utility). You can short treasury futures relatively easily too. I imagine the people shorting these futures will have TAI priced in before it’s obvious to us through other metrics.

Can't see the graph for some reason.  But I don't agree with your characterization.  It's the market's estimate of CPI-measured inflation.  I suppose you could call that "real economic yield', but I don't think there exists any such measure, especially if you're expecting it to be comparable during a strong-AI revolution.

[-]O O2mo10

It’s the estimate of real economic growth. If AGI has a good chance of happening in the next 30 years and it’s priced in, that graph should go up.

This may be a definition disagreement.  IMO, there are a LOT of changes, economic and otherwise, that go into "AI timelines", which won't be priced in to CPI-inflation predictions.

[-]O O2mo10

30y-TIPS seems like a better fit.

[-]O O4mo10

Any rationalist analysis of who might win the 2024 U.S. presidential election?

Follow Nate Silver's substack, he is the person with the best track-record I know of for predicting US elections.

[-]O O4mo10

Anyone Kelly betting their investments? I.e. taking the mathematically optimal amount of leverage. So if you’re invested in the sp500 this would be 1.4x. More or less if your portfolio has higher or lower risk adjusted returns.

I'm not, and don't know anyone who is.  Partly because it's VERY HARD to identify the actual future expectation and variance of real-world investments (hint: it's probably not normal, and bets aren't independent - tails matter more in reality than in most models), and partly because my total bankroll was mostly in future earnings, not invest-able assets.  Also, because my main debt and largest single investment is my house, which is not easily divisible.

Some people are investing with leverage (or investing in levered assets, or over-leveraging by borrowing to invest in hidden-leverage investments), but very rarely (never, AFAIK) using the Kelly Criterion as their primary calculation.  I know a few professional gamblers (poker, sports, and other advantage-play), who do use the Kelly calculations as part of their decisions, but they acknowledge it's full of estimates and use it as a red flag when they're way off, rather than a strict limit.

[-]O O4mo10

I think it’s at the very least clear for the majority of investments, leverage of 1 is suboptimal even if you assume future returns are lower and volatility is higher.

I'm not certain of that - depending on leverage options and rates, and one's estimate of investment expectation and variance, it may be that no leverage (or negative leverage - putting some amounts in ultra-safe but low-return options) is correct.

Also, don't think of "individual investments" or even "accounts" or "types" as the unit of optimal betting calculation.  Kelly's calculations work over an investor's decisions across all of their investments, and are suboptimal if applied separately to multiple slices.

[-]O O4mo10

I apply kelly criterion to all investments I control. It doesn’t take much for leverage to be worth it, excess returns of 7% and a standard deviation of 12% still imply greater than 1 leverage.

[-]O O5mo10

Anyone else capitalize on this years AI related stock market gains?

[-]O O10mo10

Any interesting fiction books with demonstrably smart protagonists?

No idea if this is the place for this question but I first came across LW after I read HPMOR a long time ago and out of the blue was wondering if there was anything with a similar protagonist.

(Tho maybe a little more demonstrably intelligent and less written to be intelligent).

Such stories are generally discussed most here https://www.reddit.com/r/rational/

[-]1a3orn10mo30

I think Traitor Baru Cormorant is excellent, with really excellent writing.

The protagonist is a smart utilitarian with hidden goals. She isn't infinitely smart, though; people beat her. And the book has an insane downer ending, so if you're worried about that don't read.

There are two sequels with a fourth supposedly (eventually) to come; the author has clearly read some rationalist-adjacent stuff like "The secret of our success."

[-]O O11mo10

A realistic takeover angle would be hacking into robots once we have them. We probably don’t want any way for robots to get over the air updates but it’s unlikely for this to be banned.

[-]O O11mo10

Is disempowerment that bad? Is a human directed society really much better than an AI directed society with a tiny weight of kindness towards humans? Human directed societies themselves usually create orthogonal and instrumental goals, and their assessment is highly subjective/relative. I don’t see how the disempowerment without extinction is that different from today to most people who are already effectively disempowered.

There are two importantly different senses of disempowerment. The stars could be taken out of reach, forever, but human civilization develops in its own direction. Alternatively, human civilization is molded according to AIs' aesthetics, there are interventions that manipulate.

[-]O O11mo10

Is there a huge reason the latter is hugely different from the former for the average person excluding world leaders.

It's a distinction between these different futures. The present that ends in everyone of Earth dying is clearly different from both, but the present literally everlasting is hopefully not a consideration.

[-]O O11mo10

I’m just trying to understand the biggest doomers. I feel like disempowerment is probably hard to avoid.

However I don’t think a disempowered future with bountiful lives would be terrible depending on how tiny the kindness weight is/how off it is from us. We are 1/10^53 of the observable universe’s resources. Unless alignment is wildly off base, I see AI directed extinction as unlikely.

I fail to see why even figures like Paul Christiano peg it at such a high level, unless he estimates human directed extinction risks to be high. It seems quite easy to create a plague that wipes out humans and a spiteful individual can do it, probably more likely than an extremely catastrophically misaligned AI.

[-]O O11mo10

Why wouldn’t a wire head trap work?

Let’s say an AI has a remote sensor that measures a value function until the year 2100 and it’s RLed to optimize this value function over time. We can make this remote sensor easily hackable to get maximum value at 2100. If it understands human values, then it won’t try to hack its sensors. If it doesn’t we sort of have a trap for it that represents an easily achievable infinite peak.

[-]JBlack11mo20

Reinforcement learning doesn't guarantee anything about how a system generalizes out of distribution. There are plenty of other things that the system can generalize to that are neither the physical sensor output nor human values. Separately from this, there is no necessary connection between understanding human values and acting in accordance with human values. So there are still plenty of failure modes.

[-]O O11mo10

Yes nothing is a guarantee in probabilities but can’t we just make it very easy for it to perfectly achieve its objective if it doesn’t go exactly the way we want it to, we just make an easier solution exist than disempowering us or wiping us out.

I guess in the long run we still select for models that ultimately don’t wirehead. But this might eliminate a lot of obviously wrong alignment failures we miss.

[-]O O11mo10

Something that’s been intriguing me. If two agents figure out how to trust that each others goals are aligned (or at least not opposed), haven’t they essentially solved the alignment problem?

e.g. one agent could use the same method to bootstrap an aligned AI.

[-]O O4mo00

Post your forecasting wins and losses for 2023.

I’ll start:

Bad:

  • I thought the banking crisis was gonna spiral into something worse but I had to revert within a few days sadly
  • overestimated how much adding code execution to gpt would improve it
  • overconfident about LK99 at some points (although I bet against it but it was more fun to believe in it and my friends were betting on it)

Good:

  • tech stocks
  • government bond value reversal
  • meta stock in particular
  • Taylor swift winning times POTY
  • random miscellaneous manifold bets (don’t think too highly of these because they were safe bets that were wildly misprinted)