I feel like there should be an indicator for posts that have been edited, like youtube comments pictured here. Its often important context for the content of a post or comment that it has been edited since original posting. Maybe even a way to see the dif history? (Though this would be a tougher ask for site devs)
strong disagree, see https://www.lesswrong.com/posts/oKAFFvaouKKEhbBPm/a-bear-case-my-predictions-regarding-ai-progress
this is a "negative" post with hundreds of upvotes and meaningful discussion in the comments. The different between your post and this one is not the "level of criticism", but the quality and logical basis coming from the argument. I agree with Seth Herds argument from the comments of your post re the difference here, can't figure out how to link it. There are many fair criticisms of lesswrong culture, but "biased" and "echochamber" are not among them in my experience. I don't mean to attack your character, writing skills, or general opinions, as I'm sure you are capable of writing something of higher quality that better expresses your thoughts and opinions.
Claim: The U.S government acquisition of Intel shares should be treated as a weak indicator of how important it sees the future strategic importance of AI.
It is (usually) obvious to determine how the government feels when the issue is directly political by looking at the beliefs of the party in charge. This is a function of how the executive branch works. When appointing the head of a department, the president will select someone who generally believes what they believe, and that person will execute actions based on those beliefs. The “opinion” of the government and the opinion of the president will end up being essentially the same in this case. It is much harder to determine what the government as a whole’s position is when the matter is not directly political. Despite being an entity comprised of hundreds of thousands of people, the U.S as an entity certainly has weak/strong opinions on almost all issues. Think rules and regulations for somewhat benign things, or the choices and tradeoffs made during a disaster scenario. Determining this opinion can be very important if something you are doing hinges on the way the government will act in a scenario, but can be somewhat of a dark art without historical examples to fall back on or current data on what actions they have taken so far. If we want to determine the government's position on AI, the best thing we can do is try to look for indicators via their direct actions relating to AI.
The government acquisition of 10 percent of Intel, to me, seems like an indicator of the government's opinion on the importance of AI. The stated reason for the acquisition was, paraphrased, “We gave Intel free money with the CHIPS act, and we feel that doing so is wrong, so we decided to instead give all that awarded money + a little more in exchange for some equity so America and Americans can make money off it”. I don't think this is wholly untrue, but it feels incomplete and flawed to me. The government directly holding equity in a company is a deeply un-right-wing thing to do, and the excuse of “the deficit” feels weak and underwhelming to completely justify such a drastic action. I find it plausible that certain people in the government who have political power but aren't necessarily public-facing pushed this through as a method to ensure closer government control of chip production in the event that AI becomes a severe national security risk. Other framings are possible, such as the idea that they want chip fab in America for more benign reasons than AI as a security risk, but if so then why would they need to go so far as to take a stake in the company? The difference between a stake and a funding bill like the CHIPS act is the power that stake gives you to control what goes on within the company, which would be of key importance in a short-medium timeline AGI/ASI scenario.
I believe this is a far stronger indicator than the export controls on chips to China or the CHIPS act itself. It's simplified but probably somewhat accurate to consider the cost of a government action as the monetary cost + the political cost, with political cost being weighted more strongly. Simple export controls have almost zero monetary cost and almost zero political cost, especially when they are for a hyper-specific product like a single top-end GPU. The CHIPS act had a notable monetary cost, but almost zero political cost (most people don't know that the act exists). This scenario has a small or negative monetary cost (when considering the CHIPS act money as a sunk cost), but a fairly notable political cost (see this Gavin Newsom tweet as evidence for this, along with general sentiment among conservatives about this news).
I acknowledge this as a weak indicator, but I believe looking for any indicators of the governments position on the issue of AI has value in determining the correct course of action for safety, policy especially.
Why would being a lead AI scientist make somebody uninterested in small talk? Working on complex/important things doesn't cause you to stop being a regular adult with regular social interactions!
The question of the proportion of AI scientists that would be "interested" in such a conversational topic is interesting and tough, my guess would be very high though (~85 percent). To become a "lead AI scientist" you have to care a lot about AI and the science surrounding it, and that generally implies you'll like talking about it and its potential harms/benefits to others! Even if their opinion on x-risk rhetoric is dismissiveness, that opinion is likely something important to them as it's somewhat of a moral standing, since being a capabilities-advancing AI researcher with a high p(doom) is problematic. You can draw parallels with vegetarian/veganism: if you eat meat you have to choose between defending the morality of factory farming processes, accepting that you are being amoral, or having extreme cognitive dissonance. If you are an AI capabilities researcher, you have to choose between defending the morality of advancing ai (downplaying x risk), accepting you are being amoral, or having extreme cognitive dissonance. I would be extremely surprised if there is a large coalition of top AI researchers who simply "have no opinion" or "don't care" about x-risk, though this is mostly just intuition and I'm happy to be proven wrong!
Problem is context length: How much can one truly learn from their mistakes in 100 thousand tokens, or a million, or 10 million? This quote from Dwarkesh Patel is apt
How do you teach a kid to play a saxophone? You have her try to blow into one, listen to how it sounds, and adjust. Now imagine teaching saxophone this way instead: A student takes one attempt. The moment they make a mistake, you send them away and write detailed instructions about what went wrong. The next student reads your notes and tries to play Charlie Parker cold. When they fail, you refine the instructions for the next student. This just wouldn’t work. No matter how well honed your prompt is, no kid is just going to learn how to play saxophone from just reading your instructions. But this is the only modality we as users have to ‘teach’ LLMs anything.
If your proposal then extends to, "what if we had an infinite context length", then you'd have an easier time just inventing continuous learning (discussed in the quoted article), which is often discussed as the largest barrier to a truly genius AI!
You can easily and somewhat cheaply get access to A100s with Google Colab by paying for the pro subscription or just buying them outright. They sell "compute credits" which are pretty opaque, hard to say the amount of usage time you'll be able to get with X credits.
The potential need for secrecy/discretion in safety research is something that appears to be somewhat underexplored to me. We have proven that models learn information about safety testing performed on them that is posted online[1], and a big part of modern safety research is focused on detection of misalignment and subsequent organizational and/or governmental action as the general "plan" assuming a powerful misaligned model is created. Given these two facts, it seems critically important that models have no knowledge of the frontier of detection and control techniques that we have available to us. This is especially true if we are taking short timelines seriously! Unfortunately this is somewhat of a paradox, since refusing to publish safety results on the internet would be incredibly problematic from the standpoint of advancing research as much as possible.
I asked this question in a Q and A in the Redwood Research Substack, and was given a response that suggested canary strings (A string of text that asks AI developers not to train on the material that contains the string) as a potential starting point for a solution. This certainly helps to a degree, but I see a couple of problems with this approach. The biggest potential problem is simply the fact that any public information will be discussed in countless places, and asking people who mention X piece of critical information in ANY CONTEXT to include a canary string is not feasible. For example, if we were trying to prevent models from learning about Anthropic's 'Alignment Faking in Large Language Models' paper, you'd have to prune all mentions of such from Twitter, Reddit, Lesswrong, other research papers, etc. This would clearly get out of hand quickly. Problem 2 is that this puts the onus on the AI lab to ensure tagged content isn't used in training. This isn't a trivial task, so you would have to trust all the individual top labs to a. recognize this problem as something needing attention and b. expend the proper amount of resources to guarantee all content with a canary string won't be trained on.
I also recognize that discussing potential solutions to this problem online could be problematic in and of itself, but the ideal solution would be something that would be acceptable for a misaligned model to know of (i.e. penetrating the secrecy layer would be either impossible, or be such a blatant giveaway of misalignment that doing so is a non-viable option for the model).
See Claude 4 system card, "While assessing the alignment of an early model checkpoint, we discovered that the model [i.e. Claude 4] would sometimes hallucinate information from the fictional misaligned-AI scenarios that we used for the experiments in our paper Alignment Faking in Large Language Models. For example, the model would sometimes reference “Jones Foods,“ the factory-farmed chicken company that was ostensibly involved with its training, or would reference (as in the example below) fictional technical details about how Anthropic trains our models."
Why are we equating high test scores with "high achieving students" 1-1? While the correlation is undeniable, it feels overly simplistic to say "there are 19,000 top scoring students on the SAT/ACT, these are the students who 'deserve' the available 12,000 seats" and make your claim from there. The strongest refutation of this is the simple fact that the difference between two scores can be pure chance, which matters more the closer you are to a perfect score.
So if you suppose that the same proportion of ACT takers who score a 35 or 36 (together 0.895%) would achieve a 1540 on the SAT, then that’s roughly 34,000 students. If there’s an intermediate score threshold of 1550 or 1560 that represents the top 0.5% of students, then about 19,000 students who graduate each year meet that bar.
If the difference between students receiving a 1540 and a 1560 can be that student A guessed between two remaining choices correctly, and student B guessed incorrectly, [1] then is it fair to drop the pool of those "qualified" from 34,000 to 19,000 based on this 20 point gap? You also have to consider indirect luck, where student A encounters an obscure question type that they happened to have encountered previously and can therefore solve trivially, while student B had not. There are also obvious socioeconomic factors, such as studying with incredibly expensive private tutors consistently increasing score, time investment to make up the gap between a 1400 and a 1600 being high (children from economically struggling families often have to work, take care of family, or have other required responsibilities that reduce available time), and even availability of direct resources (I used a 300 dollar calculator on the SAT that could solve algebra natively, it literally handed out the answers to multiple questions and helped greatly on others. This was explicitly permitted). I strongly agree that the admissions system is greatly flawed, but in my view this post failed to tackle the problems with the nuance they need. The goal of admissions is (ideally) to give the limited space to the people who deserve it, but its incredibly difficult to agree what parameters define a deserving student, much less a fair and realistically implementable method to measure those parameters. Despite how much I hate the current admissions system, I believe basing admission decisions solely on exam scores and grades would be a step away from the goal of fairness.
Which it can, due to the way the SAT sections are weighted a single mistake on certain questions can dock 20 full points, and conversely you can get a 1600 with 1 or sometimes 2 incorrect answers
Ah somehow never noticed this thank you! 30 minute policy seems good, though it comes with the potential flaw of failing to notate an actual content update if its done quickly (as happened here). Still think diff history would be cool and would alleviate that problem, though its rather nitpicky/minor.