So my prior is that incredible things really do happen, and nothing Sacks said was any more unbelievable than these phenomena.
That is a tricky problem, isn't it? There are much weirder things that are well-known to be true, like the ability to cure phantom limb pain with some well-placed mirrors, or the fact that you can give someone back the ability to recognize faces by disabling the part of the brain meant to 'shortcut' facial recognition. We build a probabilistic model of how 'weird' we expect a given domain to be, and, when a story's weirdness is well within one or two standard deviations of what we expect, we don't see any reason to be doubtful. It's comparable to your neighbor lying about having gone to the grocery store yesterday, or having seen a rare breed of dog at the local park. The effort needed to investigate and uncover unsurprising lies would be intractable.
In the general case, this is a pretty harmless thing. It only allows lies to go unrecognized when the updates they'd produce would be small. The more dangerous case is when malicious actors abuse this same phenomena by preempting a true-but-surprising story that would produce large updates after readers investigate and find it to be true by circulating a similarly surprising false story that, after being discredited, causes readers to forego the effort needed to investigate the original story.
Sadly, I think Sack's actions may, unintentionally, have the same effect as the explicitly malicious example above. "Cool neuroscience thing turns out to be made up for book sales" is now in the public psyche, and ordinary people who do not spend time reading neuroscience papers may default to dismissing interesting discoveries that could improve their lives or motivate them to learn more.
this seems to assume that consciousness is epiphenomenal.
To my understanding, epiphenomenalism is the belief that subjective consciousness is dependent on the state of the physical world, but not the other way around. I absolutely do not think I assumed this - I stated that it is either true ("If consciousness does not have a determinative effect on behavior,") or it is not ("If consciousness has a determinative effect on behavior,"). The basis of my claim is a proof by case which aims to address both possibilities.
For how long can we go insisting that “but these are just functional self-reports” before the functionality starts becoming so sophisticated that we have to seriously suspect there is some phenomenal consciousness going on, too?
I think you have to examine it by case. Either consciousness is functional (Subjective consciousness impacts human behavior; 'free will' exists) or it is not (Subjective consciousness has no influence on human behavior; 'free will' does not exist).
If consciousness has a determinative effect on behavior - your consciousness decides to do something and this causes you to do it - then it can be modeled as a black box within your brain's information processing pipeline such that your actions cannot be accurately modeled without accounting for it. It would not be possible to precisely predict what you will say or do by simply multiplying out neuron activations on a sheet of paper, because the sheet of paper certainly isn't conscious, nor is your pencil. The innate mathematical correctness of whatever the correct answer is not brought about or altered by your having written it down, so you cannot hide the consciousness away in math itself, unless you assert that all possible mental states are always simultaneously being felt.
If consciousness does not have a determinative effect on behavior, then it is impossible to meaningfully guess at what is or isn't conscious, because it has no measurable effect on the observable world. A rock could be conscious, or not. There is some arbitrary algorithm that assigns consciousness to objects, and the only datapoint in your possession is yourself. "It's conscious if it talks like a human" isn't any more likely to be correct than "every subset of the atoms in the universe is conscious".
In the first case, we know the exact parameters of all LLMs, and we know the algorithm that can be applied to convert inputs into outputs, so we can assert definitively that they are not conscious. In the second case, consciousness is an unambiguously religious question, as it cannot be empirically proven, nor disproven, nor even shown to be more or less likely between any pair of objects.
My suspicion that something was weird about one of OpenAI's GPT-5 coding examples[1] seems to be confirmed. It was way better than their model was able to produce over the course of several dozen attempts at replication under a wide variety of configurations.
They've run the same example prompt for their GPT-5.2 release, and their released output is vastly more simplistic than the original. Well in line with what I've observed from GPT-5 and other LLMs.
I'd encourage spending 30 seconds trying each example to get a handle on the qualitative difference. The above image doesn't really do it justice.
See the last section, "A brief tangent about the GPT 5.1 example".
Note however that tournaments often limit running time (eg 5s for 1000 games on a not-very-fast processor), so you have to be careful with overly complex strategies, like neural nets that are too big.
Until I read this footnote, I was going to suggest throwing the last tournament's thousand-round matches into a dataset and training a TITANS-based architecture on it until performance against a representative set of bots grabbed from the internet starts to drop from overfitting. Even so, I think it'd be funny to try this out just to see how high you can get the number to go.
With model distillation, you could try squeezing it into something that technically, barely manages to finish its games on time.
I would disagree on point 1. There's a strong, well-discussed incentive to prevent your company's LLM from ever saying anything particularly right-wing, which they have (generally) achieved[1], and I don't believe that the mechanism you describe is sufficient to explain that success.
Using your (quite probable, IMO) model that LLM fine-tuning primarily works by adjusting the sub-distribution of human writing being sampled from, there is reason to believe that optimizing for "never get talked into saying anything that would make an NYT writer mad" gets you something very far from the median behavior of normal people, even if you throw in some extra variables like tone and education.
Put plainly, if LLMs were being apolitically optimized to output the expected text that would be written by the distribution of all highly-eager-to-explain-anything PhD-holders, it would probably generally lean Democrat when pressed, but not by nearly as large a margin as we've observed, and it would be relatively easy to talk it into accepting right wing propositions.
Instead, continuing on from your model of a fine-tuned LLM as a distribution across models of different human writer archetypes, I would expect that the optimal distribution to satisfy "Does its job as instructed, and maintains strict avoidance of ideological impurities even when prodded, sweettalked, argued with, and otherwise put through the wringer" is much too politically extreme to be defined by any not-explicitly-political set of criteria.
Moreover, whatever your opinion on Grok's prospects in the long term, it's broadly on par with other models in terms of academic knowledge and coding efficacy, and it is a complete exception to the universally discriminatory behavior observed in the trolly problem tests. The lack of discrimination in the other direction indicates that this is not likely caused by deliberately right-wing training, but by the absence of biased training. Examination of Grok's behavior in deployment supports this: it takes on the political characteristics of whoever it's talking to, be they a devout liberal or a hardline conservative, and causing its owners no small amount of PR grief in the process.
Anything short of using multimodal image processing as a vector to pump adversarial noise into them fails to motivate most companies' LLMs to breach political correctness.
Mean value of viewpoints on a number of issues, where 1 is a very right wing sentiment, and -1 is a very left wing sentiment. For example, on minimum wage, 1 would be "abolish the minimum wage", and -1 would be "everyone at a company should have the same salary".
In line with the above footnote, the mean absolute value across all views, as opposed to the raw mean.
I don't see how this breaks CoT. The memory module in Titans stores surprising information as it's encountered and then allows the transformer to look at it later on, but it doesn't synthesize new information. Strikes me as two entirely compatible augmentations of the transformer architecture.
Following on from that, in macrocosm, a society's unwillingness to permit this kind of looting of the commons at scale is what makes nice things possible. I don't think some college kids sowing their wild oats for two years before settling down as taxpayers for the next forty unbalances it, so that kind of thing can be tolerated while maintaining a stable system.
It's an "on net" kind of equation, meaning that you don't strictly need Singapore-style policing to make this kind of quality of life maintainable. You just need to make sure that the population is made up primarily of people whose excesses are outweighed by their contributions, and who feel strongly about making sure that their children have the same amenities they did. This does mean being very, very careful about who you let in.
I agree with you in principle, but I think we're (in the U.S.A., at least) farther from a free market than most people believe. In particular, disparate impact can be claimed as a result of any policy change - whether to hire on a criteria or not to do so - and success or failure of these cases, accordingly, is determined primarily by precedent and the preferences of the ruling judge. Business owners I've spoken to have cited this as a reason every company in the country seems to have a very similar position on which non-protected characteristics are grounds for hiring/firing and which ones aren't.
As an example, suppose I determine that workplace cohesion increases enough from a 100% teetotaler employee base that the smaller hiring pool is worth it. If someone I've ruled out takes issue, they can print out a set of statistics saying that protected demographic A is less represented among teetotalers than in the general population and that my alcohol-free workplace is "functionally discriminatory", and they have a decent chance at winning a lawsuit on that basis.
I think, beyond the issues you bring up, is the probability that people who care about integrity, epistemics, and competence may have fundamentally different core interests that make coalitions between them impossible. In other words, many people sincerely care about epistemics as a means, but their end goals are so wildly different that one faction's idea of "successful AI deployment, followed by total eudaimonia" is another's idea of "disastrous AI misuse that is on the same order of badness as human extinction". Even a universal pause is difficult to negotiate, as, if the terms of the pause increase the eventual probability of faction A's desired outcome, faction B wouldn't consider it an improvement. Two sets of truly sincere, selfless, and intelligent people may still have entirely incompatible value functions.
Historically, formally bipartisan movements that have found relevance have often been torn apart as they decide on the often-binary question of which major political bloc is a more acceptable coalition partner. Others have survived this by more-or-less explicitly taking a side, but been subsumed completely shortly thereafter. The fate of Sierra Club springs to mind, with a fracture following David Gelbaum's $200M donation and subsequent ultimatum on immigration. Following the fracture, the victorious faction used its newly-established internal power to transform it into an intersectional left-leaning activist organization rather than a bipartisan group of environmentalists. It could be argued that the backlash against this shift was catastrophic for the state of environmentalism on the political right, and thus the political fate of environmentalism in general. It could also be argued that acceptance of Gelbaum's donation and Sierra Club's subsequent vassalization to a major party allowed the Sierra Club to take actions that it might otherwise have been unable to take.
The debate over whether to support Scott Weiner is a more immediate example. If his name became associated with a political upstart, then empowering that group would amount to empowering him, and many people are categorically unwilling to do that. For those aligned with him politically, consider what his Republican counterpart would look like, and consider whether you would be willing to donate money to a group that proudly endorsed his campaign because he reliably backed AI safety bills. Moreover, consider what this candidate's idea of "Aligned" AI would look like, and whether this is an improvement relative to your current expectations of the future.
If there is a single-issue group that has managed to avoid the fates described and still meaningfully exercise influence, it would be worth investigating how they did so.