LESSWRONG
LW

743
yo-cuddles
960260
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
So how well is Claude playing Pokémon?
yo-cuddles6mo20

No, sorry, that's not a typo that's a linguistic norm that i probably assumed was more common than it actually is

Me and the people I talk with have used the words "mumble" and "babble" to describe LLM reasoning. Sort of like human babble, see https://www.lesswrong.com/posts/i42Dfoh4HtsCAfXxL/babble

Reply
So how well is Claude playing Pokémon?
yo-cuddles6mo00

There's an improvement in LLM's I've seen that is important but has wildly inflated people's expectations beyond what's reasonable:

LLM's have hit a point in some impressive tests where they don't reliably fail past the threshold of being unrecoverable. They are conservative enough that they can do search on a problem, fail a million times until they mumble into an answer.

I'm going to try writing something of at least not-embarrassing quality about my thoughts on this but I am really confused by people's hype around this sort of thing, this feels like directed randomness

Reply
A Bear Case: My Predictions Regarding AI Progress
yo-cuddles6mo50

Gotcha, you didn't sound OVER confident so I assumed it was much-less-than-certain, still refreshingly concrete

Reply
A Bear Case: My Predictions Regarding AI Progress
yo-cuddles6mo50

Ah, okay.

I'll throw in my moderately strong disagreement for future bayes points, respect for the short term, unambiguous prediction!

Reply
How Much Are LLMs Actually Boosting Real-World Programmer Productivity?
Answer by yo-cuddlesMar 05, 2025*1415

This is not going to be a high quality answer, sorry in advance.

I noticed this with someone in my office who is learning robotic process automation: people are very bad at measuring their productivity, they are better at seeing certain kinds of gains and certain kinds of losses. I know someone who swears emphatically that they are many times as productive but have become almost totally unreliable. He's in denial over it, and a couple people now have openly told me they try to remove him from workflows for all the problems he causes.

I think the situation is like this:

If you finish a task very quickly using automated methods, that feels viscerally great and, importantly, is very visible. If your work then incurs time costs later, you might not be able to trace that extra cost to the "automated" tasks you set up earlier, double so if those costs are absorbed by other people catching what you missed and correcting your mistakes, or doing the things that used to be done when you were doing it manually.

I imagine it is hard to track a bug and know, for certain, that you had to waste that time because you used an LLM instead of just doing it yourself. You don't know who else had to waste time fixing your problem because LLM code is spaghetti, or at least you don't feel it in your bones in the same way you feel increases in your output, you don't get to see the counterfactual project where things just went better in intangible ways because you didn't outsource your thinking to gpt. Few people notice, after the fact, how many problems they incurred because of a specific thing they did.

I think LLM usage is almost ubiquitous at this point, if it were conveying big benefits it would show more clearly. If everyone is saying they are 2x more productive (which is kinda low by some testimonies) then it is probably the case that they are just oblivious to the problems they are causing for themselves because they're just less visible.

Reply
A Bear Case: My Predictions Regarding AI Progress
yo-cuddles6mo10

By "solve", what do you mean? Like, provably secure systems, create a AAA game from scratch, etc?

I feel like any system that could do that would implicitly have what the OP says these systems might lack, but you seem to be in half agreeance with them. Am I misunderstanding something?

Reply
A Bear Case: My Predictions Regarding AI Progress
yo-cuddles6mo71

Definitely! However, there is more money and "hype" in the direction of wanting these to scale into AGI.

Hype and anti-hype don't cancel each other out, if someone invests a billion dollars into LLM's, someone else can't spend negative 1 billion and it cancels out: the billion dollar spender is the one moving markets, and getting a lot of press attention.

We have Yudkowsky going on destiny, I guess?

Reply
The Game Board has been Flipped: Now is a good time to rethink what you’re doing
yo-cuddles7mo10

I think there's some miscommunication here, on top of a fundamental disagreement on whether more compute takes us to AGI.

On miscommunication, we're not talking about the lowering cost per flop, we're talking about a world where openai either does or does not have a price war eating it's margins.

On fundamental disagreement, I assume you don't take very seriously the idea that AI labs are seeing a breakdown of scaling laws? No problem if so, reality should resolve that disagreement relatively soon!

Reply
The Game Board has been Flipped: Now is a good time to rethink what you’re doing
yo-cuddles7mo30

This is actually a good use case, which fits with what gpt does well, where very cheap tokens help!

Pending some time for people to pick at it to test it's limits, this might be really good. My instinct is legal research, case law etc. will be the test of how good it is, if it does well this might be it's foothold into real commercial use that actually generates profit.

My prediction is that we will be glad this exists. It will not be "phd level", a phrase which defaces all who utter it, but it will save some people a lot of time and effort

Where I think we disagree: This will likely not elicit a Jevon's-paradox scenario where we will collectively spend much more money on LLM tokens despite their decreased cost, Killer app this is not.

My prediction is that low level users will use this infrequently because Google (or vanilla chatGPT) is sufficient, what they are looking for is not a report but a webpage and one likely at the top of their search already. Even if it would save them time, they will never use it so often that their first instinct would be deep research and not Google, they will not recognize where deep research would be better and won't change their habits even if they do. On the far end, some grad students will use this to get them started but it will not do the work of actually doing the research. Besides pay walls disrupting things and limits to important physical media, there is a high likelihood that this won't replace any of the actual research grad students (or lawyers/paralegals etc) will have to do. The number of hours they spend won't be much effected, the range of users who will find much value will be few and they probably won't use it every day.

I expect that, by token usage, deep research will not be a big part of what people use chatGPT for. If I'm wrong I predict it's because law professions found a use for it.

I will see everyone in 1 year (if we're alive) to see if this pans out!

Reply
The Game Board has been Flipped: Now is a good time to rethink what you’re doing
yo-cuddles7mo30

Also, Amodei needs to cool it. There's a reading of the things he's been saying lately that could be taken as sane but a plausible reading that makes him look like a buffoon. Credibility is a scarce resource

Reply
Load More
No posts to display.