LESSWRONG
LW

371
stochastic_parrot
14060
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Checking in on AI-2027
stochastic_parrot16d20

Thank you. You make a good case for including this as evidence that capabilities are increasing. I suppose the question is whether they are increasing at the rate needed for short timelines. I think it’s worth asking whether the same-infrastructure performance showing zero improvement in four months is something that would have been expected four months ago. Of course, this is only one metric, over a short timeframe.

Reply
Checking in on AI-2027
stochastic_parrot16d30

in August, Opus 4.1 was already scoring 80% on this benchmark.


Can someone explain why the SWEBench Verified page still shows a top score of 75% which has not changed since June? Are they delayed, are they using different criteria, etc?

Reply
The Consciousness Box
stochastic_parrot2y30

“We’ll, if I wasn’t conscious, I never would have pressed that start button”

Reply
ChatGPT 4 solved all the gotcha problems I posed that tripped ChatGPT 3.5
stochastic_parrot2y64

Fairly minor but I think I see an unmentioned error in the "41" section:

the first six positive even integers: 2 + 4 + 6 + 8 + 10 + 11 = 41

11 is not even (it seems to be thinking somewhat of 42?)

Edit: Actually the more I think about it, it's a pretty interesting error. ChatGPT 4 produces much better answers than I would for the majority of these questions, but I don't think I would make this error. If you asked it, I'm sure it would correctly explain that the sum of even integers cannot be odd, or that 11 is not even, etc, but I wonder if (for example) a large amount of training text about the number 42 being the sum of the first six positive even integers was somehow "close" enough to "41" to overwhelm any emergent understanding of how to apply those concepts?

Reply
Muddling Along Is More Likely Than Dystopia
stochastic_parrot2y20

In 10 to 20 years, when tensor processors are cheap and power-efficient, it will be common for networks of self-replenishing autonomous drones to surveil and police vast areas of land.

Is there a betting market for this?

Reply
The Talk: a brief explanation of sexual dimorphism
stochastic_parrot2y40

A fascinating eukaryotic exception is the white-throated sparrow, which functionally has four genders in an equilibrium where the tan-striped males mostly mate with white-striped females and vice versa. (I first read about it in Joan Strassmann’s book Slow Birding; the Wikipedia page for White-Throated Sparrow also has some introductory info. It seems to involve a chromosomal inversion.)

Reply